I don't agree that a model is a derivative work, and I think a judge would likely agree with me. I think you need to be able to show those major copyrightable elements of the original work are actually present in the allegedly derivative work, something that is very non-trivial with even the most transparent of models like Stable Diffusion - scientists doing intensive analysis of the SD model were only able to find around a hundred instances of reproduced images from the source material out of several hundred thousand attempts.
That said, it definitely would be copyright infringement to download a bunch of copyrighted material and actually use it in some way, for example to train a model. Luckily, in most jurisdictions it is recognised that this is the case and so governments have specifically carved out exceptions to copyright law for this process (known as text and data mining or TDM). This includes the UK, the EU, Japan, and China. In the US, there is no specific law addressing the issue yet, but many companies are doing it in the US (and have been doing it for many years) with the presumption of legality based on the Google v Author's Guild and Google v Perfect 10 rulings. Basically, they are acting under the assumption that it is fair use, which I think is a ~reasonable assumption and I think would be held up by the US Supreme Court if they wanted to take it.
That said, it definitely would be copyright infringement to download a bunch of copyrighted material and actually use it in some way, for example to train a model. Luckily, in most jurisdictions it is recognised that this is the case and so governments have specifically carved out exceptions to copyright law for this process (known as text and data mining or TDM). This includes the UK, the EU, Japan, and China. In the US, there is no specific law addressing the issue yet, but many companies are doing it in the US (and have been doing it for many years) with the presumption of legality based on the Google v Author's Guild and Google v Perfect 10 rulings. Basically, they are acting under the assumption that it is fair use, which I think is a ~reasonable assumption and I think would be held up by the US Supreme Court if they wanted to take it.