Once you understand how videos are encoded, it's pretty obvious why gif (and apng) are not and will never be as small as videos. Videos use temporal data. You have one key frame (where the whole image is stored) and the following frames are encoded as delta (difference) from the previous frame. So a still video could basically be the size of a single image.
Animated images aren't that smart, they're just a bunch of images, so you're storing every single frame.
It's not quite that stark. You can have the non changing background solely in the first frame, and have subsequent frames (smaller frames even, placed at offsets) composed solely of "sprite like" things that overlay. And there are tools like imagemagick that can optimize existing animated gifs in that way. It is crude, but it's more than just a plain sequence of unrelated full size frames .
> Animated images aren't that smart, they're just a bunch of images, so you're storing every single frame.
GIF89a has a sort of limited delta encoding via the "do not dispose" option. If you write a frame with "do not dispose" and then write a frame with some transparent pixels, the transparent pixels will retain the colors from the previous frame.
It's rather primitive but it's better than nothing.
Use gifsicle. It's excellent and very fast, I don't think there's anything that produces smaller gifs than it. [0]
You can also use ImageMagick to produce optimised gifs (it has about 4 different options for that, eg -layers OptimizeTransparency). See [1]. But in my own tests, gifsicle always beats it (often by a huge margin) and is vastly faster. That ImageMagick page may be out of date. Also, Tumblr contracted the author of gifsicle, Dr. Eddie Kohler, to make some improvement for them, hopefully that made it back into gifsicle [2].
I think most GIF optimizers would do it, although I haven't tried any. E.g. https://ezgif.com/optimize says they do it ("makes unchanged parts of the following frames transparent").
Kind of an old reply, but I should honestly test that- encoding on a CPU only environment compared to one with a GPU with specialized support takes around 4-5x more, but I didn't test decoding yet on a CPU only environment yet.
Animated images aren't that smart, they're just a bunch of images, so you're storing every single frame.