It is said that the MNG decoder uses a few hundred kilobytes of code whereas APNG reuses more code from the PNG library. APNG is sort of a hack where the first frame is a normal PNG and subsequent frames are hidden in custom chunks.
It's not that the first frame is a normal PNG and that the rest of the frames are hidden. That's close, but it's not exact.
PNGs are stored as files with multiple kinds of chunks inside. The relevant one here is the IDAT chunk, which holds image data. Most PNGs have just one IDAT, but APNGs carry multiple (one for each frame). Readers that don't care about animating will simply display the first IDAT and stop reading there.
So it's a bunch of PNGs, coalesced into one, with some frame timing data. And the code for reading them is tiny if you already have a PNG library, because you display them like you would a regular PNG, but making sure to read out every IDAT, at the speed denoted by the acTL and fcTL chunks.
Yes most PNGs have just one IDAT, but having multiple IDATs does not imply an animation. In a normal PNG, the specification says that the contents of the IDAT chunks are chained together.
This feature is used if the DEFLATE compressor only uses a small output buffer (because each chunk has a length and CRC). The feature is also used if the PNG is enormous, because the maximum size of a chunk is 2^31 - 1 bytes but by chaining IDATs it is legal to represent image data that can't fit in a single chunk.