That conclusion is based on their benchmarks. I'm not interested in those. I'm i...

riku_iki · on Nov 1, 2023

that benchmark(HumanEval) is some public benchmark built by others.

PoignardAzur · on Nov 1, 2023

That kind of benchmark is a lot more reliable for models published before the benchmarks; models published afterwards have more opportunity to "study to the test". That's especially a concern when a company explicitly uses its score on that benchmark as a marketing point.

riku_iki · on Nov 1, 2023

sure, but it is the best thing we have.

emptysongglass · on Nov 1, 2023

Well no we have the anecdotes of all the HN folks which I trust many, many times more than a benchmark.

riku_iki · on Nov 1, 2023

lol, you can continue trusting anecdotes from internet. Industry prefers more scientific methods.

emptysongglass · on Nov 2, 2023

So Paul Graham posted that Phind is better and got absolutely destroyed in the comments

https://twitter.com/paulg/status/1719657855240815026

No, I do not take these benchmarks seriously and for good reason. They're benchmarks. The only thing that matters is the user's direct experience of the product. And Phind isn't there.

riku_iki · on Nov 2, 2023

> got absolutely destroyed in the comments

by tweeter trolls?..