From what I've found through Google (with no real understanding of llm) 2^16 is the max tokens per minute for fine tuning OpenAI's models via their platform. I don't believe this is the same as the training token count.
Then there's the context token limit, which is 16k for 3.5 turbo, but I don't think that's relevant here.
Though somebody please tell me why I'm wrong, I'm still trying to wrap my head around the training side.
You are right to be curious. The encoding used by both GPT-3.5 and GPT-4 is called `cl100k_base`, which immediately and correctly suggests that there are about 100K tokens.
From what I've found through Google (with no real understanding of llm) 2^16 is the max tokens per minute for fine tuning OpenAI's models via their platform. I don't believe this is the same as the training token count.
Then there's the context token limit, which is 16k for 3.5 turbo, but I don't think that's relevant here.
Though somebody please tell me why I'm wrong, I'm still trying to wrap my head around the training side.