Both human and LLM may learn from reading code to produce novel, derivative, or ...

IshKebab · on June 25, 2023

> the human is not

Why not? Humans store information in their brains that they have learnt. So do AIs. What exactly is the difference between a weight in an Artificial Neural Network and a weight in a Natural Neural Network?

If the answer is "humans get special treatment" then that's fine I guess but I think it's worth being explicit that that's the difference.

> The LLM does the same thing as zipping (i.e., compresses the training data…by encoding it in the model weights).

It's not at all the same. It's highly lossy. Only extremely highly repeated works get memorised exactly and even then it's often not exact.

LLMs do not contain a copy of all the training data (if trained properly). I agree if that was the case then it would be different, but that isn't how they work (unless you badly overfit).

JoshTriplett · on June 25, 2023

> If the answer is "humans get special treatment" then that's fine I guess but I think it's worth being explicit that that's the difference.

That's absolutely the difference. Humans aren't copyrightable; the alternative would be unconscionable.

> even then it's often not exact

You don't have to copy something exactly to be a derivative work. "Lord of the Rings but a random 15% of words are replaced with gibberish" is still a derivative work of Lord of the Rings. So is "Lord of the Rings but every word/sentence is paraphrased".

cgearhart · on June 26, 2023

An LLM contains some portion of the training data exactly and the rest of it lossily. What I’m really arguing is that _alone_ that is enough to make the model itself a derivative work. It actually doesn’t matter whether that’s the same or different than a human; that’s a distraction. The AI model is itself a work that is derived from the training data.