> I wouldn't be surprised if even proprietary content like the books themselves ...

paodealho · 2026-02-06T13:49:54 1770385794

also:

"Researchers Extract Nearly Entire Harry Potter Book From Commercial LLMs"

https://www.aitechsuite.com/ai-news/ai-shock-researchers-ext...

sigmoid10 · 2026-02-06T13:57:42 1770386262

The big AI houses are all in involved in varying degrees of litigation (all the way to class action lawsuits) with the big publishing houses. I think they at least have some level of filtering for their training data to keep them legally somewhat compliant. But considering how much copyrighted stuff is spread blisfully online, it is probably not enough to filter out the actual ebooks of certain publishers.

rendx · 2026-02-06T23:06:02 1770419162

> I think they at least have some level of filtering for their training data to keep them legally somewhat compliant.

So far, courts are siding with the "fair use" argument. No need to exclude any data.

https://natlawreview.com/article/anthropic-and-meta-fair-use...

"Even if LLM training is fair use, AI companies face potential liability for unauthorized copying and distribution. The extent of that liability and any damages remain unresolved."

https://www.whitecase.com/insight-alert/two-california-distr...