Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The benchmarks are harder than you might imagine and contain more wrong answers and terrible questions than you would expect.

You don't need to take my word for it, try playing MMLU yourself.

https://d.erenrich.net/are-you-smarter-than-an-llm/index.htm...

Its not MMLU-Pro btw, which is considerably harder.



Sure and AGI will 100% it 100% of the time, even if it is hard.


Your definition of AGI must be absurd




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: