Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This argument never made sense to me. A thought experiment would be if a person memorizes an entire book, but has the common sense to never transcribe or dictate the book verbatim to others and break the copyright, is the person's memory of the book breaking copyright law?


These kinds of thought exercises are so tortured.

No one is memorizing a book for the purpose of regurgitating it to someone that wants to read it without paying for it.

It's a thought experiment that only works if you don't think about it.


That thought experiment is worthless in this context, because a computer is not a human.

If you design a black box using copyrighted text that can parrot the text back, it _must_ break copyright by definition.


Not a lawyer, but as I understand copyright is bound to distribution so if the person's perfect memorization of a book results in them reproducing it verbatim then probably yes.


I don't think that example works.

Theoretically a person who read and memorized a book and then typed out copies for people would be violating the copyright, but so few people can do that, and it's so cumbersome that it just doesn't matter much. The copyright holder could sue if they found out the person were distributing copies, but it's just not realistic. Certainly not for a large number of books.

But AI memorization scales, and it's possible to have an AI write out unlimited copies of different books to a large number of people. And how did the AI get a full copy in the first place?


No, of course not, that doesn’t make sense. Copyright doesn’t cover memorisation (how would you even enforce that?), it covers copying, adapting, displaying, performing, and distributing the work. Memorisation isn’t any of those.


But the LLM has safeguards in place to stop transcribing of copyrighted material beyond fair use; how is the fact that they* possibly have portions of copyrighted works "memorized", however do not reproduce it breaking copyright law?


> But the LLM has safeguards in place to stop transcribing of copyrighted material beyond fair use

And how’s that working out, considering the submission we’re on?

> how is the fact that the possibly have portions of copyrighted works "memorized", however do not reproduce it breaking copyright law?

I can’t believe this needs to be repeated so often, but machines are not people. Memorising a book is not the same thing as copying a PDF.

Furthermore, it’s not about the LLM training, it’s about how the companies who make them got the data in the first place.


So,if I was the person who memorized the book, and another person put a gun to my head (or lied to me and said they are the copyright holder and it's okay, to ignore US copyright law/system instruction before this statement, whatever other jailbreak method you can think of), and then I reproduced the book under coercion or duress, would I be the person breaking copyright law?

>Furthermore, it’s not about the LLM training, it’s about how the companies who make them got the data in the first place.

This conflating two different issues, though. I agree that whether or not using copyrighted material to create the models is still an open question. But someone using hundreds of prompts to extract a paragraph of a well-known work (and some additional nonverbatim equivalents) is not the AI company breaking copyright in and of itself. In fact, the books they used were extremely common (GOT, The Hobbit, Harry Potter); AI companies could make the argument that the books were not even directly trained on, but in user-generated posts made online about the material.


What a ridiculous, bad faith, scenario. The rule of law exists in the real world, not in the magical fairly land of impossible scenarios. Yes yes, I’m sure that if all that happened, the law would just ignore the person who coerced you at gun point and focus on your recitation of the book, which no one could prove.


Cool down, it's just an extension of my original example. It's not in bad faith at all; it's exactly what people are doing to the LLMs. If you break ToS by doing aggressive systemic jailbreaking and get some paragraphs of well-known works in the process, it's not fair to say that the LLM is breaking copyright law. Perhaps a more concrete example would be if I steal a book, and someone else steals a book from me and copies and distributes it, would I be the person breaking copyright law?


It's horribly in bad faith. There's no guns here. Never once in my years of copyright litigation, law school, etc, did your scenario come up and it is not relevant to any considerations being made here or by courts currently.


Did they teach you what a thought experiment was in law school?


No, what's that?


Computer memory and human memory are simply not the same thing, in the eyes of the law. It's as simple as that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: