Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> What's Anthropic's optimization target??? Getting you the right answer as fast as possible!

What makes you believe this? The current trend in all major providers seem to be: get you to spin up as many agents as possible so that you can get billed more and their number of requests goes up.

> Slot machines have variable reward schedules by design

LLMs by all major providers are optimized used RLHF where they are optimized in ways we don't entirely understand to keep you engaged.

These are incredibly naive assumptions. Anthropic/OpenAI/etc don't care if you get your "answer solved quickly", they care that you keep paying and that all their numbers go up. They aren't doing this as a favor to you and there's no reason to believe that these systems are optimized in your interest.

> I built things obsessively before LLMs. I'll build things obsessively after.

The core argument of the "gambling hypothesis" is that many of these people aren't really building things. To be clear, I certainly don't know if this is true of you in particular, it probably isn't. But just because this doesn't apply to you specifically doesn't mean it's not a solid argument.



> The current trend in all major providers seem to be: get you to spin up as many agents as possible so that you can get billed more and their number of requests goes up.

I was surprised when I saw that Cursor added a feature to set the number of agents for a given prompt. I figured it might be a performance thing - fan out complex tasks across multiple agents that can work on the problem in parallel and get a combined solution. I was extremely disappointed when I realized it's just "repeat the same prompt to N separate agents, let each one take a shot and then pick a winner". Especially when some tasks can run for several minutes, rapidly burning through millions of tokens per agent.

At that point it's just rolling dice. If an agent goes so far off-script that its result is trash, I would expect that to mean I need to rework the instructions and context I gave it, not that I should try the same thing again and hope that entropy fixes it. But editing your prompt offline doesn't burn tokens, so it's not what makes them money.


Cursor and others have a subagent feature, which sounds like what you wanted. However, there has to be some decision making around how to divide up a prompt into tasks. This is decided by the (parent) model currently.

The best-of-N feature is a bit like rolling N dice instead of one. But it can be quite useful if you use different models with different strengths and weaknesses (e.g. Claude/GPT-5/Gemini), rather than assigning all to N instances of Claude, for example. I like to use this feature in ask mode when diving into a codebase, to get an explanation a few different ways.


> What makes you believe this?

Simply, cut-throat competition. Given multiple nations are funding different AI-labs, quality of output and speed are one of the most important things.


Dating apps also have cut-throat competition and none of them are optimised for minimising the time you spend on the app.


They don’t, they’re all owned by Match group


So why can't a better one start?


Because Match buys them out. PoF, Tinder, Hinge started independent, and were bought by Match once they showed promise.


~90% of them are owned by Match


sigh We're doing this lie again? Quality of Outcome is not, has never been, and if the last 40 years are anything to go on will never be a core or even tangential goal. Dudes are trying to make the stock numbers go up and get paid. That's it. That's all it ever is.


You're just being pedantic and cynical.

Goal of any business in principle is profit, by your terms all of them are misaligned.

Matter of fact is that customers are receiving value and the value has been a good proxy for which company will grow to be successful and which will fail.


I'm being neither pedantic nor cynical. Do you need a refresher on value proposition vs actual outcomes on the last few decades of breathlessly hyped tech bubbles? Executive summary: the portions of tech industry that attract the most investment consistently produce the worst outcomes, the more cash the shittier the result. It's also worth noting that "value" is defined as anything you can manipulate someone to pay for.


I mean, yeah. All businesses are misaligned, unless a fluke aligns the profit motive with the consumers for a brief period.


Hey man people either get it or they don't. We're doomed.


How is nation-states funding private corporations "cut-throat competition"?


Ok, to be very honest I wrote that in the middle of having a couple of drinks. I guess, what I mean is, countries are funding AI labs because it can turn into a “winner-takes-it-all” competition. Unless the country starts blocking the leading providers.

Private companies will turn towards the best, fastest, cheapest (or some average of them). Country borders don’t really matter. All labs are fighting to get the best thing out in the public for that reason, because winning comes with money, status, prestige, and actually changing the world. This kind of incentives are rare.


> countries are funding AI labs because it can turn into a “winner-takes-it-all” competition.

Winner takes what exactly? They can rip off react apps quicker than everyone else? How terrifying.


Like I understand this commentary, but it’s so detached from reality. My dad in his 70s is writing Excel macros, even though he never touched that in his life. There are a ton of cases like this, but people can’t see reality out of their domains.


That’s so dope excel finally let old people learn this! They removed the agelock?!


Come on man, you know exactly what I mean. You can keep coming up with these arguments, but the world has moved on already. I genuinely don’t know a single person in 3 different countries from age 12+ who does not use LLMs at least once a week. We have to adapt, or choose to not play the “game”.


Your counter to "excel was never hard to learn" is "people use LLMs all day long" ??

I uh, think the LLM use has compromised your critical thinking skills.


What does this even mean? Are you disputing the fact that AI labs are competing with each other because they are funded by nation-states?


Why do you have to compete if you can just say "but China!" And get billions more dollars from the government


Cut throat competition between nations is usually called war. In war, gathering as much information as possible on everyone is certainly a strategic wanna do. Selling psyops about how much benefits will come for everyone willing to join the one sided industrial dependency also is a thing. Giving significant boost to potentially adversarial actors is not a thing.

That said universe don't obligate us to think the cosmos is all about competition. Cooperation is always possible as a viable path, often with far more long term benefits at scale.

Competition is superfluous self inflict masochism.


There’s a line to be trod between returning the best result immediately, and forcing multiple attempts. Google got caught red-handed reducing search quality to increase ad impressions, no reason to think the AI companies (of which Google is one) will slowly gravitate to the same.


My (possibly dated) understanding is that OpenAI/Anthropic are charging less than it costs right now to run inference. They are losing money while they build the market.

Assuming that is still true, then they absolutely have an incentive to keep your tokens/requests to the absolute minimum required to solve your problem and wow you.


Bill is unrelated to their cost. If they can produce answer in 1/10th of the token, they can charge 10x more per token, likely even more.


That is simply not true, token price is largely determined by the token price of their rival services (even before their own operational costs). If everybody else charges about $1 per millions of tokens, then they will also charge about $1 per millions of tokens (or slightly above/below) regardless of how many answers per token they can provide.


It only matters if the rivals have same performance. Opus pricing is 50x Deepseek, and like >100x of small models. It should match rival if the performance is same, and if they can produce model with 10x lower token usage, they can charge 10x.

Gemini increased the same Flash's price by something like 5x IIRC when it got better.


I bet that the actual "performance" of all the top-tier providers is so similar, that branding has bigger impact on if you think Claude or ChatGPT peforms better.


Performance or perception of performance

Potato potato Tomato tomato


[dead]


Your cousin sounds like a solid dude, haha


This applies when there is a large number of competitors.

Now companies are fighting for the attention of a finite number of customers, so they keep their prices in line with those around them.

I remember when Google started with PPC - because few companies were using it, it cost a fraction of recent prices.

And the other issue to solve is future lack of electricity for land data centers. If everyone wants to use LLM… but data centers capacity is finite due to available power -> token prices can go up. But IMHO devs will find an innovative approach for tokens, less energy demanding… so token prices will probably stay low.


Opus 4.6 costs about 5-10x of GLM 5.


What businesses charge for a product is completely unrelated to what it costs them.

They charge what the market will bear.

If "what the market will bear" is lower than the cost of production then they will stop offering it.


Companies make a loss on purpose all the time.


Not forever. If that's their main business then they will eventually have to profit or they die.


I'm also seeing a lot of new rambling in Sonnet 4.6 when compared to 4.5, more markdown slop and pointing out details and things in the context which isn't too useful etc...

which then causes increased token usage because you need to prompt multiple times.

Idk, maybe it's just me though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: