Thank you for this term. In my view, the belief that AGI singularly will rapidly destroy us because it will think 10,000 times faster than us is a form of thinkism.
> Last mile is already “solved” with the little robots that drive around cities, no need for hands.
And yet we haven’t seen widespread adoption because they can’t handle stairs, steep slopes, streets without sidewalks, sidewalks with mud, or a hundred other real world challenges
We haven’t seen widespread adoption because they can’t hope to compete with human delivery drivers on cost. The cost to DoorDash and Uber Eats of a delivery driver is nothing upfront and a few dollars per delivery. The cost of a delivery robot is thousands of dollars upfront and more per delivery. Stairs aren’t even in the top 10 problems these robots face, they’re more than capable of delivering to most customers already.
Sometimes it’s hard to objectively tell whether two animals don’t appear to reproduce because they are unable genetically, or technically able still but behaviorally unwilling in normal natural circumstances, or we don’t know but we just haven’t observed it for that particular combo, etc
it has been pretty much a benchmark for memorization for a while. there is a paper on the subject somewhere.
swe bench pro public is newer, but its not live, so it will get slowly memorized as well. the private dataset is more interesting, as are the results there:
> If your job is to translate requirements into code manually - and that's it - you're the generalist travel agent.
I’ve been a full-stack web programmer at five different companies over the last fifteen years, big and small, e-commerce and B2B, junior to senior to staff, and that has never fully described my responsibilities.
I'm also curious what results we would get if SWE came up with a new set of 500 problems to run all these models against, to guard against overfitting.
Thank you for this term. In my view, the belief that AGI singularly will rapidly destroy us because it will think 10,000 times faster than us is a form of thinkism.
reply