It's telling that the author doesn't list all of the file sizes. The GIFs are gigantic but the AV1 files are LARGER than either the H.264 or VP9 versions in every example. If we wanted to replace GIFs you'd want to go with something closer to a comparable level of support and at this scale there's no reason to use a format with no hardware support and limited client support in general:
Note also that this is _any_ support at all, including the slow software implementations which boost the VP9 and AV1 numbers but have significant drawbacks if you care about quality, battery life, or the impact on other things running on the same device.
Do you remember that time [1] when GIFs had a patented algorithm, and suddenly the the patent holder decided to enforce it more strictly? This is how GIFs were massively replaced with PNGs, except for the animated. PNG support sucked at the time, too (see e.g. IE6). By now it's far superior to GIF in every way.
I think there is a bit of a parallel here. H.264 and WebM may be brilliant codecs from the engineering POV, but they are somehow encumbered [2]. This may end up in obvious legal problems; if possible, these should be avoided early on, by not investing the content in them where we can.
I do remember that period well but there’s an interesting wrinkle: almost everyone with a computer already has a licensed implementation from their device/OS vendor. This is a consideration for anyone encoding video for the web who isn’t already producing H.264 and doesn’t have access to a licensed encoder but that can’t be a very large group of people – and even if it was larger, given the history of things like GIFv I’d be very surprised if the license costs outweighed the huge savings in bandwidth.
> but the AV1 files are LARGER than either the H.264 or VP9 versions in every example
If the author aimed for the same quality they would be much smaller, they instead opted for the same bitrate because that would be opening the can of worms of "similar quality is subjective in the eye of the author." If you watch the samples, you can clearly see how more more AV1 gets done with [roughly] the same number of bits; H.264 looks like a complete joke in comparison.
If you massaged the AV1 bitrate until it was the same blurry mess as H.264 (in your eyes), it would likely be much smaller.
Except they didn't achieve quite the same bitrate. For scene 1, for example, H.264 was 209.9 kbps, VP9 was 191.2 kbps, AV1 was 230.1 kbps. Put another way: the AV1 stream had 39 additional kpbs (or 20% more bits) than the VP9 stream. 20% is a pretty big deal (especially at these low bitrates), and undermines the point the author was trying to make.
The increase in quality (and decrease in bitrate) for H.264 → VP9 is really cool. But the increase in quality for VP9 → AV1 isn't as impressive because the bitrate also increased. What would have really driven the author's point home was if the AV1 stream was higher quality and a lower bitrate than the VP9 stream.
You’re right about the bitrates producing similar sizes but “blurry mess” seems like pure hyperbole, especially in the context of replacing GIFs rather than say a theater setup (or we should count AV1s difficulty playing in real-time on all but the latest hardware against it).
I was thinking of the comparison from this angle:
GIF: plays everywhere, horrible quality and giant file sizes, high CPU usage.
H.264: plays everywhere, good quality and file sizes, almost universal hardware acceleration even on cheap devices
VP9: plays many places, competitive size with H.264, hardware acceleration is common but entire popular platforms lack support
AV1: limited support, great file sizes, hardware support has barely started shipping.
If the goal is to replace GIFs I would weight compatibility and ease of playback much greater than bumping the file size savings from 95% to 97%.
The H264, VP9 and AV1 files are all targeting the same file size. The only reason they aren't the exact same size down to the byte is because ratecontrol is somewhat finicky and no one cares enough about that level of precision to make it work.
They've also been done in WASM but the comment is correct as written because the idea is that the vast majority of times when you serve a <video> with an MPEG-4 source file it'll be decoded entirely in hardware whereas AV1, H.265, and to a lesser extent, VP-9 will be loading up the CPU.
To put this in perspective, I have a 9 year old MacBook Air at home which I use for testing. If you look at 720/1080p video on YouTube, even that ancient hardware it takes 5-10% CPU to play H.264 content. I have a 2017 desktop at work with 4.2GHz Core i7 which still takes almost 100% of a CPU to play 720p AV-1 and ~60% to play VP9, or ~1% to play H.264. That's a really big difference for something like a GIF successor which will be widely shared, often with multiple visible at the same time, and people will expect to just work even on hardware which is more than a year old while still leaving capacity to do other things.
https://caniuse.com/#feat=mpeg4 97.16%
https://caniuse.com/#feat=webm 86.39%
https://caniuse.com/#feat=hevc 16.57%
https://caniuse.com/#feat=av1 35%
Note also that this is _any_ support at all, including the slow software implementations which boost the VP9 and AV1 numbers but have significant drawbacks if you care about quality, battery life, or the impact on other things running on the same device.