It's an AES-based mode, so it only has the AES message block (and, implicitly, the key) to work with, and some of the bits in the message block need to hold the CTR counter. The problem is that leaves an uncomfortably low margin of random bits.
I guess I never conceptually got onboard with the idea that hashing down more entropy is better than pulling the exact amount of good entropy. But I lack the formal qualifications to argue as much (or knowledge of a proof to the contrary). I know we like to be paranoid when things make us uncomfortable.
It's not an entropy question at all; it's that the algorithm only gives you a fixed amount of bits (the key size plus the AES message block) to divide up, and if you only use the message block itself (as NIST GCM does) that's not enough space to feel comfortable picking nonces at random due to e.g. the birthday problem.