You're talking about the metadata of the files, which can always be edited and someone will inevitably try to make software to do exactly that. Also, Adobe's proposal for handling generated content is exactly this and they're not able to get buy-in from other companies.
Edit the metadata in what way? It's a cryptographic hash.
If the bits that make up the video as was recorded by the camera don't match the hash anymore, then you know it was modified. That doesn't mean it's fake, it just means use skepticism when viewing. On the other hand the ones that have not been modified and still match can be trusted.
Essentially 0% of professional photography or videography uses "straight out of the camera" (SOOC) JPEGs or video. It's always raw photos or "log" video, then edited to look like what the photographer actually saw. The signal would be so noisy as to be useless.
Sure they could, but then you trim the video by 2 seconds, tweak the colors, or just send it over WhatsApp, which recompresses the file with its own encoder. The hash breaks instantly. Cryptography protects bits, but video is about visual meaning. The slightest pixel modification kills the hardware signature. Plus, it does absolutely nothing to fix the "analog hole" problem - a scammer can just point that cryptographically signed iphone camera at a high-quality deepfake playing on a monitor
I would assume whatsapp would read the hash and verify it when the video is chosen to be sent to someone, so the reciever would see that the video that was selected by the sender was indeed authentic. Assuming you trust meta to re-encode it and not mess with it.
As far as recording a monitor, I guess, but I feel like you can tell that someone is recording a monitor.
As far as editing, no it wont work in those cases, but the point here is not to verify ALL videos, but to have an easy way for people to verify important videos. People will learn that if you edit it, it won't be verified, so they will be less inclined to edit it if they want to make it clear it's an authentic video. Think like people recording some event going down on the streets etc or recording a video message for family and friends.
If AI video generation is going to get that good, don't you think it would be a good idea to have a way to record provably authentic videos if we need? Like a police interaction or something. There is no real reason to need to edit that.
Also, could a video hash just be computed every X seconds, and give the user the choice to trim the video at each of those intervals?
Hashing every X seconds is just a Merkle tree, the tech for that has been around forever. But cryptography only protects the container, not the semantic meaning inside it. If verifying a video requires spinning up this massive crypto infrastructure that can just be trivially bypassed with a hardware camera spoofer anyway, that defense is completely worthless for the mass market. Scammers would bypass it in their sleep.