Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This being a 3B model isn't remotely comparable to GPT4.

WizardCoder 34B and Phind 34B are the only models remotely comparable, and they are still slightly worse than GPT 3.5 (let alone GPT4).



How about Mistral 7B? I saw this article recently:

https://wandb.ai/byyoung3/ml-news/reports/Fine-Tuning-Mistra...


Mistral 7B is very cool for its size. But unfortunately no open model is close to GPT4 as of right now.


If the rumors around GPT4 being a mixture of expert models are true, the this comparison is not fair.

What would be interesting is compare GPT4 at a certain task with a small model fine tuned for that task.


GPT4 being a mixture of experts is irrelevant imo like we don't care about how many layers there are in a network and how wide those layers are or which type of activation functions are actually used etc. all that matters are we can run it on a specific hardware and the results.


Exactly. I don't get why people (non AI researchers) discount MoE like they are cheating or fake parameters.

Even if each inference pass only runs part of the network, there's still a trillion learnable parameters there lol.


But the thing is, it doesn't need to know much about "other stuff", just about code (and basic English instructions)

So comparing it with big models I'd say it's good but might have limited usefulness

(you can probably go further with 3B with only code)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: