This being a 3B model isn't remotely comparable to GPT4. WizardCoder 34B and Phi...

aorth · on Oct 11, 2023

How about Mistral 7B? I saw this article recently:

https://wandb.ai/byyoung3/ml-news/reports/Fine-Tuning-Mistra...

redox99 · on Oct 11, 2023

Mistral 7B is very cool for its size. But unfortunately no open model is close to GPT4 as of right now.

tarruda · on Oct 11, 2023

If the rumors around GPT4 being a mixture of expert models are true, the this comparison is not fair.

What would be interesting is compare GPT4 at a certain task with a small model fine tuned for that task.

johnthewise · on Oct 11, 2023

GPT4 being a mixture of experts is irrelevant imo like we don't care about how many layers there are in a network and how wide those layers are or which type of activation functions are actually used etc. all that matters are we can run it on a specific hardware and the results.

redox99 · on Oct 11, 2023

Exactly. I don't get why people (non AI researchers) discount MoE like they are cheating or fake parameters.

Even if each inference pass only runs part of the network, there's still a trillion learnable parameters there lol.

raverbashing · on Oct 11, 2023

But the thing is, it doesn't need to know much about "other stuff", just about code (and basic English instructions)

So comparing it with big models I'd say it's good but might have limited usefulness

(you can probably go further with 3B with only code)