The glue on pizza thing was a bit more pernicious because of how the model came to that conclusion: SERPs. Google's LLM pulled the top result for that query from Reddit and didn't understand that the Reddit post was a joke. It took it as the most relevant thing and hilarity ensued.
In that case the error was obvious, but these things become "dangerous" for that sort of use case when end users trust the "AI result" as the "truth".
Treating "highest ranked," "most upvoted," "most popular," and "frequently cited" as a signal of quality or authoritativeness has proven to be a persistent problem for decades.
Depends on the metric. Humans who up-voted that material clearly thought it was worth.
The problem is distinguishing the various reasons people think something is worth and using the right context.
That requires a lot of intelligence.
The fact that modern language models are able to model sentiment and sarcasm as well as they do is a remarkable achievement.
Sure there is a lot of work to be done to improve that, especially at scale and in products where humans are expecting something more than a good statistical "success rate", but they actually expect the precision level they are used from professionally curated human sources.
In this case it was a loss of context.
The original post was highly upvoted because in the context of jokes it was considered good.
Take it out of that context and treat "most upvoted" as a signal that means something like authoritativeness and the result will be still be hilarious,
but this time unintentionally so.
In that case the error was obvious, but these things become "dangerous" for that sort of use case when end users trust the "AI result" as the "truth".