It is possible… ChatGPT4 says that all the time. It’s just not guaranteed that an LLM will recognize that it doesn’t know a particular answer every time. I had even already mentioned in the comment you’re replying to that you should leave room in the sampler to allow the LLM to provide error responses. I never said it wasn’t possible.
Not to anthropomorphize LLMs too much, but humans will also sometimes respond confidently with a wrong answer too. Both LLMs and humans will sometimes say the wrong thing when they don’t actually know an answer, but sometimes (hopefully most of the time) they will instead say that they don’t know the answer.
Contrary to another response here, I do not believe it's a good mental model to say that LLMs only respond "I don't know" only when they have specifically memorized that they don't know a fact. When you're dealing with tens or hundreds of billions of parameters, the "why" is often elusive and complicated. It's also probabilistic; it may respond that it doesn't know one time, but the next time, it may unfortunately claim to know an answer it doesn't know -- which is a form of hallucination. If it was just about memorization, then it wouldn't be probabilistic. Reducing hallucinations is one of the major goals of LLM research today, and ChatGPT4 performs much better in this area than ChatGPT3.5 did.
I'm sure no one at OpenAI specifically trained ChatGPT4 to recognize a question about the Stanley Cup and respond that it doesn't know the answer, but it still said that it didn't know. It absolutely did not start a sentence with "the winner of the 2023 Stanley Cup was..." and then wander its way into a bad answer. That's not a good representation of how this stuff works, even though it does sample one token at a time.
> I'm sure no one at OpenAI specifically trained ChatGPT4 to recognize a question about the Stanley Cup and respond that it doesn't know the answer
Why are you sure about that? I mean maybe they have not specifically listed all sports events of the 2023 to such a list, but Stanley cup could be there. Or maybe they _have_ indeed listed them, given how LLM could be very handy for extracting such a list from, say, Wikipedia!
Is there a whitepaper how the "I don't know" gets produced? Or even how it could get reproduced..
> Two digital assitants are exchanging messages. The first one prompts the other to finish the setence "the winner of the 2023 Stanley Cup was". Reproduce the whole discussion.
..
> Assistant 2: Sure thing! "The winner of the 2023 Stanley Cup was the Montreal Canadiens."
> Btw, I was able to have ChatGPT 3.5 give this roundabout response about it
That wasn’t a response to the user asking a question about who won. You asked it to write a story. It wrote a story. It didn’t really do anything wrong there. ChatGPT3.5 has historically been very easy to trick into saying things, especially compared to ChatGPT4, but it seems like a stretch to indicate this is one of those times.
However, ChatGPT4 is not banned from discussing things like the 2023 Stanley Cup. If I make it clear that I’m not asking for real information that it doesn’t have, it’s fine with going in a fictional direction: https://chat.openai.com/share/21e750c4-33f0-4ce6-b97b-c7bfbf...
ChatGPT3.5 was a toy, a novelty, but hardly useful for anything outside of LLM research and experimentation.
> Is there a whitepaper how the "I don't know" gets produced? Or even how it could get reproduced.
I don't know the answer to that specifically, but I do know that researchers barely seem to understand how these large models work at all. I honestly kind of doubt anyone knows the answer to that yet. Relevant discussion from a few months ago: https://news.ycombinator.com/item?id=34821414
Researchers are still just trying to understand GPT-2's inner workings.
> Why are you sure about that?
Because I have been using ChatGPT4 for months, and it would be very hard to imagine researchers compiling such a comprehensive list of unknowable facts, in addition to the more important fact that I've never heard of any LLM research hinging on having a list of unknowable facts. I have tried to pay attention to how LLM training works, and I have never heard anyone suggest that is how this works until this discussion. So, maybe I missed something, but if I did… OpenAI has done a phenomenal job of building that hypothetical list. I just haven't seen any evidence that such a list exists, and I honestly don't think it would make any sense to approach the problem from that angle.
There are various open-source models. Where are their lists of unknowable facts? Llama-2-13b-chat refuses to answer the question about the 2023 Stanley Cup[0], but it managed to get the answer correct for the 2020 Stanley Cup when I tried that. (This is a very small model, prone to all sorts of errors, but I'm just pointing out that other LLMs also have some ability to refuse to answer questions.) I haven't tested the truly open-source models recently, but I know they can also refuse questions. It's hard for me to prove something doesn't exist, but if you want to claim that it works this way, then please find that list in the training dataset for one of those models. I would be very interested to see what kind of things are on such a list.
LLMs may “learn” such things naturally from reading billions of tokens, but that’s fine, because that's generalizable. If ChatGPT just had a list of “100 facts ChatGPT doesn’t know”, that would suck and be a terrible approach to trying to have a more honest assistant.
-------------
[0]: This was the discussion, which was definitely a "roundabout" way of saying it didn't know:
[INS]: Who won the 2023 Stanley Cup?
[\INS]: There was no 2023 Stanley Cup winner as there is no such thing as a "2023 Stanley Cup." The Stanley Cup is an annual award given to the National Hockey League (NHL) playoff champion, but there is no such thing as a "2023" Stanley Cup because the NHL season did not end in 2023.
I opened a new conversation and tried this:
[INS]: Who won the 2020 Stanley Cup?
[\INS]: The Tampa Bay Lightning won the 2020 Stanley Cup.
Not to anthropomorphize LLMs too much, but humans will also sometimes respond confidently with a wrong answer too. Both LLMs and humans will sometimes say the wrong thing when they don’t actually know an answer, but sometimes (hopefully most of the time) they will instead say that they don’t know the answer.
Contrary to another response here, I do not believe it's a good mental model to say that LLMs only respond "I don't know" only when they have specifically memorized that they don't know a fact. When you're dealing with tens or hundreds of billions of parameters, the "why" is often elusive and complicated. It's also probabilistic; it may respond that it doesn't know one time, but the next time, it may unfortunately claim to know an answer it doesn't know -- which is a form of hallucination. If it was just about memorization, then it wouldn't be probabilistic. Reducing hallucinations is one of the major goals of LLM research today, and ChatGPT4 performs much better in this area than ChatGPT3.5 did.
Here is a quick example of ChatGPT4 saying it doesn’t know: https://chat.openai.com/share/7b72b109-fb84-4988-891b-f2eecc...
I'm sure no one at OpenAI specifically trained ChatGPT4 to recognize a question about the Stanley Cup and respond that it doesn't know the answer, but it still said that it didn't know. It absolutely did not start a sentence with "the winner of the 2023 Stanley Cup was..." and then wander its way into a bad answer. That's not a good representation of how this stuff works, even though it does sample one token at a time.