Well, all the problems obviously caused by the lack of an underlining physical model (like the hands thing) disagree. Your comment just says that up to now it has been easier to ignore those problems than to fix them.
But video has much stricter physical constraints than image, so it's not clear we can ignore the problem at all.
But video has much stricter physical constraints than image, so it's not clear we can ignore the problem at all.