Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> As a result, the model isn’t trained on understanding the useRalativeImagePath token, and so it outputs something that isn’t a valid token.

That isn't how LLMs generate tokens. Each step outputs a logit for each possible token in the tokenizer (100k in the case of GPT-3.5), then softmaxes the logits to covert them into probabilities, and samples from them depending on temperature to get the token to be used.

It's possible something in the tokenizer BPE merge process breaks due to the rare token, which can be verified offline using tiktoken. But if GPT-4 works, and since GPT-3.5 and GPT-4 use the same tokenizer, then that's likely not the issue.



I suspect more likely this token is simply blacklisted after the r/counting incident - ie. any response containing it will now return an error.


What was the r/counting incident?



Exactly this. The tokens generated should always be valid, unless some post-processing layer between the model's output and the user interface detects for some keywords which it would prefer to filter out. In which case I suppose there is another commonly seen error message that appears?


Not really, right? There are a ton of special tokens, like start of sequence etc., so what happens if there are two start of sequences predicted? It's a valid token but cannot really be turned into something sensible, so it throws an error when converting tokens to plain text?


Special tokens are handled by the application, not the model. They are still output before then.


Correct me if I'm wrong—but we don't know if GPT-4 uses the same tokenizer as GPT-3.5, right?


OpenAI's web tokenizer demo confirms it: https://platform.openai.com/tokenizer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: