> What's Anthropic's optimization target??? Getting you the right answer as fast as possible!
What makes you believe this? The current trend in all major providers seem to be: get you to spin up as many agents as possible so that you can get billed more and their number of requests goes up.
> Slot machines have variable reward schedules by design
LLMs by all major providers are optimized used RLHF where they are optimized in ways we don't entirely understand to keep you engaged.
These are incredibly naive assumptions. Anthropic/OpenAI/etc don't care if you get your "answer solved quickly", they care that you keep paying and that all their numbers go up. They aren't doing this as a favor to you and there's no reason to believe that these systems are optimized in your interest.
> I built things obsessively before LLMs. I'll build things obsessively after.
The core argument of the "gambling hypothesis" is that many of these people aren't really building things. To be clear, I certainly don't know if this is true of you in particular, it probably isn't. But just because this doesn't apply to you specifically doesn't mean it's not a solid argument.
> The current trend in all major providers seem to be: get you to spin up as many agents as possible so that you can get billed more and their number of requests goes up.
I was surprised when I saw that Cursor added a feature to set the number of agents for a given prompt. I figured it might be a performance thing - fan out complex tasks across multiple agents that can work on the problem in parallel and get a combined solution. I was extremely disappointed when I realized it's just "repeat the same prompt to N separate agents, let each one take a shot and then pick a winner". Especially when some tasks can run for several minutes, rapidly burning through millions of tokens per agent.
At that point it's just rolling dice. If an agent goes so far off-script that its result is trash, I would expect that to mean I need to rework the instructions and context I gave it, not that I should try the same thing again and hope that entropy fixes it. But editing your prompt offline doesn't burn tokens, so it's not what makes them money.
Cursor and others have a subagent feature, which sounds like what you wanted. However, there has to be some decision making around how to divide up a prompt into tasks. This is decided by the (parent) model currently.
The best-of-N feature is a bit like rolling N dice instead of one. But it can be quite useful if you use different models with different strengths and weaknesses (e.g. Claude/GPT-5/Gemini), rather than assigning all to N instances of Claude, for example. I like to use this feature in ask mode when diving into a codebase, to get an explanation a few different ways.
Simply, cut-throat competition. Given multiple nations are funding different AI-labs, quality of output and speed are one of the most important things.
sigh We're doing this lie again? Quality of Outcome is not, has never been, and if the last 40 years are anything to go on will never be a core or even tangential goal. Dudes are trying to make the stock numbers go up and get paid. That's it. That's all it ever is.
Goal of any business in principle is profit, by your terms all of them are misaligned.
Matter of fact is that customers are receiving value and the value has been a good proxy for which company will grow to be successful and which will fail.
I'm being neither pedantic nor cynical. Do you need a refresher on value proposition vs actual outcomes on the last few decades of breathlessly hyped tech bubbles? Executive summary: the portions of tech industry that attract the most investment consistently produce the worst outcomes, the more cash the shittier the result. It's also worth noting that "value" is defined as anything you can manipulate someone to pay for.
Ok, to be very honest I wrote that in the middle of having a couple of drinks. I guess, what I mean is, countries are funding AI labs because it can turn into a “winner-takes-it-all” competition. Unless the country starts blocking the leading providers.
Private companies will turn towards the best, fastest, cheapest (or some average of them). Country borders don’t really matter. All labs are fighting to get the best thing out in the public for that reason, because winning comes with money, status, prestige, and actually changing the world. This kind of incentives are rare.
Like I understand this commentary, but it’s so detached from reality. My dad in his 70s is writing Excel macros, even though he never touched that in his life. There are a ton of cases like this, but people can’t see reality out of their domains.
Come on man, you know exactly what I mean. You can keep coming up with these arguments, but the world has moved on already. I genuinely don’t know a single person in 3 different countries from age 12+ who does not use LLMs at least once a week. We have to adapt, or choose to not play the “game”.
Cut throat competition between nations is usually called war. In war, gathering as much information as possible on everyone is certainly a strategic wanna do. Selling psyops about how much benefits will come for everyone willing to join the one sided industrial dependency also is a thing. Giving significant boost to potentially adversarial actors is not a thing.
That said universe don't obligate us to think the cosmos is all about competition. Cooperation is always possible as a viable path, often with far more long term benefits at scale.
Competition is superfluous self inflict masochism.
There’s a line to be trod between returning the best result immediately, and forcing multiple attempts. Google got caught red-handed reducing search quality to increase ad impressions, no reason to think the AI companies (of which Google is one) will slowly gravitate to the same.
My (possibly dated) understanding is that OpenAI/Anthropic are charging less than it costs right now to run inference. They are losing money while they build the market.
Assuming that is still true, then they absolutely have an incentive to keep your tokens/requests to the absolute minimum required to solve your problem and wow you.
That is simply not true, token price is largely determined by the token price of their rival services (even before their own operational costs). If everybody else charges about $1 per millions of tokens, then they will also charge about $1 per millions of tokens (or slightly above/below) regardless of how many answers per token they can provide.
It only matters if the rivals have same performance. Opus pricing is 50x Deepseek, and like >100x of small models. It should match rival if the performance is same, and if they can produce model with 10x lower token usage, they can charge 10x.
Gemini increased the same Flash's price by something like 5x IIRC when it got better.
I bet that the actual "performance" of all the top-tier providers is so similar, that branding has bigger impact on if you think Claude or ChatGPT peforms better.
This applies when there is a large number of competitors.
Now companies are fighting for the attention of a finite number of customers, so they keep their prices in line with those around them.
I remember when Google started with PPC - because few companies were using it, it cost a fraction of recent prices.
And the other issue to solve is future lack of electricity for land data centers. If everyone wants to use LLM… but data centers capacity is finite due to available power -> token prices can go up. But IMHO devs will find an innovative approach for tokens, less energy demanding… so token prices will probably stay low.
I'm also seeing a lot of new rambling in Sonnet 4.6 when compared to 4.5, more markdown slop and pointing out details and things in the context which isn't too useful etc...
which then causes increased token usage because you need to prompt multiple times.
What makes you believe this? The current trend in all major providers seem to be: get you to spin up as many agents as possible so that you can get billed more and their number of requests goes up.
> Slot machines have variable reward schedules by design
LLMs by all major providers are optimized used RLHF where they are optimized in ways we don't entirely understand to keep you engaged.
These are incredibly naive assumptions. Anthropic/OpenAI/etc don't care if you get your "answer solved quickly", they care that you keep paying and that all their numbers go up. They aren't doing this as a favor to you and there's no reason to believe that these systems are optimized in your interest.
> I built things obsessively before LLMs. I'll build things obsessively after.
The core argument of the "gambling hypothesis" is that many of these people aren't really building things. To be clear, I certainly don't know if this is true of you in particular, it probably isn't. But just because this doesn't apply to you specifically doesn't mean it's not a solid argument.