If you’re willing, I’d love your insight on the “why one might want to do this”....

topwalktown · on Nov 29, 2023

Quantization also works as regularization; it stops the neural network from being able to use arbitrarily complex internal rules.

But really it's only really useful if you absolutely need to have a discrete embedding space for some sort of downstream usage. VQVAEs can be difficult to get to converge, they have problems stemming from the approximation of the gradient like codebook collapse

visarga · on Nov 28, 2023

Maybe it helps to point out that the first version of Dall-E (of 'baby daikon radish in a tutu walking a dog' fame) used the same trick, but they quantized the image patches.