sdeiley's comments

sdeiley · 2026-02-28T00:46:03 1772239563

What an absolute shitshow of an administration.

Massive kudos to Anthropic.

sdeiley · 2026-02-24T01:03:00 1771894980

There are dozens of us that used them before AI! Dozens!

sdeiley · 2026-02-24T01:01:23 1771894883

So Agent teams but headless?

Beefin · 2026-02-24T14:08:15 1771942095

llm coordination is just one feature - the core (and why i built amux) was so that i can quickly delegate from my phone, see outputs, monitor, etc without raw ssh.

sdeiley · 2026-02-21T04:29:20 1771648160

This isnt true for big code bases. Subagents or orchestration become vital for context handholding

mikert89 · 2026-02-21T04:37:35 1771648655

yeah i think sub agents are needed, missed that in my comment

sdeiley · 2026-02-21T04:18:54 1771647534

Shocked pikachu face

Not the guy doing lotteries for voters!

sdeiley · 2026-02-19T21:57:46 1771538266

Googler. We use GPUs, but its a drop in the bucket in the sea of our accelerators. We might sell more GPUs in Cloud than we use internally.

These are not data driven observations just vibes

sdeiley · 2026-02-19T21:47:18 1771537638

People underrate Google's cost effectiveness so much. Half price of Opus. HALF.

Think about ANY other product and what you'd expect from the competition thats half the price. Yet people here act like Gemini is dead weight

____

Update:

3.1 was 40% of the cost to run AA index vs Opus Thinking AND SONNET, beat Opus, and still 30% faster for output speed.

https://artificialanalysis.ai/?speed=intelligence-vs-speed&m...

bluegatty · 2026-02-19T22:20:00 1771539600

You can pay 1 cent for a mediocre answer or 2 cents for a great answer.

So a lot of these things are relative.

Now if that equation plays out 20K times a day, well that's one thing, but if it's 'once a day' then the cost basis becomes irrelevant. Like the cost of staplers for the Medical Device company.

Obviously it will matter, but for development ... it's probably worth it to pay $300/mo for the best model, when the second best is $0.

For consumer AI, the math will be different ... and that will be a big deal in the long run.

harrall · 2026-02-20T01:08:49 1771549729

Yeah you’re right but most people in the world do not need an agent that codes.

I think Gemini gives fine answers outside code tasks.

Outside of work, where I use Claude, Gemini is cheaper for me (for what I would use AI for) than both Claude and ChatGPT so Google gets my money.

fhub · 2026-02-20T00:55:51 1771548951

Right now I'll pay 2x for a subjectively 20+% better coding agent. But in a year I don't think there will be an agent that to me is subjectively 20% better amongst the big three.

viking123 · 2026-02-20T04:53:11 1771563191

So where is the moat for these companies then, in the end will they all be almost the same from the pov of a normal person? So it's just price competition?

dzhiurgis · 2026-02-20T07:45:37 1771573537

Google will win, it’s becoming obvious

xnx · 2026-02-20T15:20:11 1771600811

> You can pay 1 cent for a mediocre answer or 2 cents for a great answer.

But Gemini is also a great answer (possibly slightly less great or more great).

When consumers cannot easily assess a product's quality, they frequently use price as a primary indicator, equating higher costs with superior quality.

andai · 2026-02-20T10:23:58 1771583038

Quality is Anthropic's game.

Quantity is OpenAi's.

Google's is... specialized hardware? (For now.)

Also deeper crawls, and Google Books! (Though it's unclear if they're making good use of those.)

WarmWash · 2026-02-19T22:34:36 1771540476

Gemini is the most paradoxical model because it benchmarks great even in private benchmarks done by regular people, Deep Mind is unquestionably full of capable engineers with incredible skill, and personally Gemini has been great for my day job and my coding for fun (not for profit) endeavors. Switching between it and 4.6 in antigravity and I don't see much of a difference, they both do what I ask.

But man, people are really avid about it being an awful model.

sdeiley · 2026-02-19T22:51:30 1771541490

People can be and often are wrong.

You'd notice how good Opus is in Claude Code. IMHO CC is the secret sauce

manmal · 2026-02-20T17:41:44 1771609304

Opus is just as good in pi.dev, Amp, or OpenCode. CC is an increasingly bug ridden slopfest.

sumedh · 2026-02-20T11:11:33 1771585893

> IMHO CC is the secret sauce

Cant smart people just reverse engineer CC and figure out what is the secret sauce atleast for CC App?

c0n5pir4cy · 2026-02-20T10:13:47 1771582427

I feel like a lot of this is just Googles tooling - if you're using Antigravity/Gemini CLI and then use Claude Code it feels like a huge difference. I can say from experience though (using Cline + OpenCode) that they are really close.

The harness is just much better on the Anthropic side.

kingstnap · 2026-02-20T06:26:21 1771568781

I personally found Gemini 3.0 to step on my toes in Agentic coding. I tried it around 10 or so times but it quickly became apparent that it was somehow coming to its own conclusions about what needs to be done instead of following instructions.

Like files I didn't mention being edited and read and stuff of that nature. Sometimes this is cute in fixing typos in docs but when its changing things where it clearly doesn't even understand the intentionality behind something it's annoying.

Gemini 3.1 is clearly much better when trying it today. It stayed focused and found its way around without getting distracted.

DangitBobby · 2026-02-21T15:15:18 1771686918

I've found in everyday chat use with Gemini that it confuses things _it_ says for things I've said, which is normally fine for my purposes but I imagine would lead to the scenario you're describing in coding sessions.

arnorhs · 2026-02-20T08:56:21 1771577781

The only cases where I've had gemini step on my toes like that is when a) I realized my instructions were unclear or missing something b) my assumptions/instructions were flawed about how/why something needed to be done.

kingstnap · 2026-02-20T11:04:12 1771585452

Instruction following has improved a lot since a few years ago but let's not pretend these things are perfect mate.

There's a certain capacity of instructions, albiet its quite high, at which point you will find them skipping points and drifting. It doesn't have to be ambiguity in instructions.

dzhiurgis · 2026-02-20T07:47:54 1771573674

So strange. I switched from claude few months ago to gemini3 and didn’t look back. Speed is big one, code quality just vastly better, all while far cheaper. I do need to try latest claude models tho.

SergeAx · 2026-02-23T19:41:18 1771875678

All perceptions are very personal and anecdotal. Here's mine: I tried to rebuild a website from Hugo to Astro. Gemini 3.0 was mediocre and in the end just failed and was unable to complete the task. Sonnet did almost well. I had to flush the context once most of the job was finished, for atomic git commits and deployment scripts.

xnx · 2026-02-20T15:21:41 1771600901

> But man, people are really avid about it being an awful model.

If you told people Gemini 3.1 was Claude 4.7, they'd be going nuts singing its praises.

KoolKat23 · 2026-02-19T23:37:22 1771544242

Outside of code, Gemini is really really good.

lunarboy · 2026-02-20T15:13:15 1771600395

It's so weird. I actually prefer the web version for generic questions like "how would I do X in git" or something, and it'll answer it well. Gemini CLI will immediately try to run git log on the entire graph, grep every single file in the repo, like just answer the question. I actually put in gemini.md to just answer first without running other commands unless explicitly requested and it's been a lot better

KoolKat23 · 2026-02-20T16:22:24 1771604544

Thanks for this suggestion, it's actually been my experience too.

startages · 2026-02-20T16:09:59 1771603799

This is misleading. I'm running a live experiment here: https://project80.divcrafts.com/

There are 4 models, all receiving the exact same prompts a few times a day, required to respond with a specific action.

In the first experiment I used gemini-3-pro-preview, it spent ~$18 on the same task where Opus 4.5 spent ~$4, GPT-5.1 spent ~$4.50, and Grok spent ~$7. Pro was burning through money so fast I switched to gemini-3-flash-preview, and it's still outspending every other model on identical prompts. The new experiment is showing the same pattern.

Most of the cost appears to be reasoning tokens.

The takeaway here is: Gemini spends significantly more on reasoning tokens to produce lower quality answers, while Opus thinks less and delivers better results. The per-token price being lower doesn't matter much when the model needs 4x the tokens to get there.

camel_Snake · 2026-02-20T22:46:05 1771627565

Is that no longer the case, or am I misunderstanding the operational costs displayed?

Opus: 521k input tokens; 12k out

Grok: 443k input tokens; 57k out

Gemini: 677k input tokens; 7k out

OAI: 543k input tokens; 17k out

Gemini appears to use by far the least amount of reasoning tokens, assuming they're included in the output counts.

nu11ptr · 2026-02-19T22:09:19 1771538959

That sounds great, but if Opus generates 20% better code think of the ramifications of that on a real world project. Already $100/month gets you a programmer (or maybe even 2 or 3) that can do your work for you. Insanity. Do I even care if there is something 80% as good for 50% the cost? My answer: no. That said, if it is every bit as good, and their benchmarks suggest it is (but proof will be in testing it out), then sure, a 50% cost reduction sounds really nice.

rudolph9 · 2026-02-20T00:01:20 1771545680

If I was building an application using massive amounts of calls to the api, I’d probably go with Gemini. For a Copilot, definitely Opus.

jstummbillig · 2026-02-19T21:51:12 1771537872

It's not half price or cost effective if it can't do the job, that I am happy to pay twice the price for to get done.

But I agree: If they can get there (at one point in the past year I felt they were the best choice for agentic coding), their pricing is very interesting. I am optimistic that it would not require them to go up to Opus pricing.

NiloCK · 2026-02-20T13:50:29 1771595429

There's cost, and cost effectiveness. I'd say so far that received negative value for the prompts that I've sent to Gemini 3.

Skill issue, maybe, but I can't get gemini to do any nontrivial tasks reliably, and it's difficult to have it do trivial tasks without getting distracted and making unrelated changes that eat my time and mental energy to think about.

The breakthrough advance of Opus 4.5 over 4.1 wasn't so much an intelligence jump, but a jump in discerning scope and intent behind user queries.

vitaflo · 2026-02-19T22:46:39 1771541199

Deepseek is 2% of the cost of Opus. But most people aren't using that for code even tho it's ridiculously cheap.

fastball · 2026-02-19T22:18:03 1771539483

We are not at the moment where price matters. All that matters is performance.

sdeiley · 2026-02-19T22:29:09 1771540149

What did you say? Cant hear you over the $400B in capex spend.

Counterpoint: price will matter before we hit AGI

fragmede · 2026-02-20T01:14:44 1771550084

Why do you believe it has to? Uber took 15 years to show a profit. 15 years from 2022 when chatgpt launched is 2037. That's long enough that to say I don't know if I'll even be alive by then.

blitzar · 2026-02-20T08:39:22 1771576762

Uber didnt burn the market cap of the 10th largest company in the world every couple of years.

willis936 · 2026-02-19T22:25:15 1771539915

It matters to me. I pay for it and I like using it. I pick my models to keep my spend reigned in.

fastball · 2026-02-20T03:32:36 1771558356

What do you use it for? What is your time worth that you'd settle for a lesser model to save a few bucks?

willis936 · 2026-02-20T10:23:34 1771583014

Homelab and hobby assistant. I have spent $300 for 12 months of tokens. If I'm burning up more than $25 a month then I'd have to pay more or curb use at the end of the year. $25 / month as a new expense is something I can accept for a toy that is letting me accelerate my fun stuff. I can't justify more than that. So I'm left constantly evaluating if my current task is worth more than future tasks and if it is expected to be harder than future tasks. Speculative execution is already one of the harder things I do at work.

csmpltn · 2026-02-19T22:05:03 1771538703

> "People underrate Google's cost effectiveness so much. Half price of Opus. HALF."

Google undercutting/subsidizing it's own prices to bite into Anthropic's market share (whilst selling at a loss) doesn't automatically mean Google is effective.

sdeiley · 2026-02-19T22:12:44 1771539164

Everybody is subsidizing their prices.

But Flash is 1/8 the cost of sonnet and its not impressive?

csmpltn · 2026-02-19T22:43:23 1771541003

Sure, for the launch. Until they start introducing ads, capping existing subscriptions and raising prices (on all products)

surajrmal · 2026-02-20T02:20:59 1771554059

I think you are underestimating how much cheaper it is for Google to run the workloads compared to competitors. The hardware advantage is real.

SXX · 2026-02-20T04:58:58 1771563538

Enshittification will begin eventually. Google already cut free limits on AI studio from 100 rpd to 10 rpd so they started cost savings already.

surajrmal · 2026-02-21T03:44:55 1771645495

What does that have to do with what I said? Everyone knows that the companies are operating at a loss right now to capture market share in the hope that it's sticky. Google is losing far less money and will not need to get nearly as extreme with how they try to extra money from the product. That honestly makes me feel better about it's long term prospects. And who knows, maybe local llms will prevent it from getting truly bad anyways. Competition tends to keep product quality high.

sumedh · 2026-02-20T11:21:40 1771586500

> Everybody is subsidizing their prices.

Inference is profitable but model training needs lot of money.

bugfix · 2026-02-20T00:11:28 1771546288

Do they offer a subscription like Claude? These models waste so many tokens "thinking", that using via API is a complete waste of money.

tkuraku · 2026-02-20T04:19:04 1771561144

https://one.google.com/about/google-ai-plans/?utm_source=g1&...

mcintyre1994 · 2026-02-20T10:13:10 1771582390

At least Anthropic tells you how many more tokens you’re paying for! 5x 10x 20x whatever. Google seems to just say more, higher, highest.

blinding-streak · 2026-02-21T15:38:31 1771688311

The pricing page for Claude literally says "More usage" for the $17/month pro plan. Doesn't really quantify anything. The usage is whatever they feel like it should be.

And then the very expensive plan says "Choose 5x or 20x more usage than Pro". It's all arbitrary.

metadat · 2026-02-19T22:28:26 1771540106

Attention is the new scarce resource. Saving even 50% is nothing if it wastes more of my time.

raincole · 2026-02-20T05:42:38 1771566158

^ This is a weird Gemini shilling account (check their comment history) but I still want to point how ridiculous this statement is:

> Think about ANY other product and what you'd expect from the competition thats half the price.

Car, fashion, jewelry, earphone, furniture, keyboard, mouse, restaurant, house,...

sdeiley · 2026-02-21T04:24:01 1771647841

Lol Ive admitted im a google employee, not hiding my bias.

Most things aren't worth commenting on except the gemini posts here, which I find insane.

And pretty much every example you gave Id expect quite a lot more for 2x the amount? Idk man

integricho · 2026-02-21T08:16:10 1771661770

So you are an anthropic / openai employee, actually.

sdeiley · 2026-02-22T03:21:31 1771730491

Lol i wish.

Decabytes · 2026-02-19T21:53:17 1771537997

Any tips for working with Gemini through its chat interface? I’ve worked with ChatGPT and Claude and I’ve generally found them pleasant to work with, but everytime I use Gemini the output is straight dookie

londons_explore · 2026-02-19T22:22:04 1771539724

make sure you use ai studio (not the vertex one), not the consumer gemini interface. Seems to work better for code there.

briHass · 2026-02-20T01:48:44 1771552124

Even though I don't like the privacy implications, make sure you use the option to save and use past chats for context. After a few months of back and forth (hundreds of 'chat' sessions), the responses are much higher quality. It sometimes does 'callbacks' to things discussed in past chats, which are typically awkward non-sequiturs, but it does improve it overall.

When I play with it in 'temporary chat' mode that ignores past chats and personal context directives, the responses are the typical slop littered with emojis, worthless lists, and platitudes/sycophancy. It's as jarring as turning off your adblocker and seeing the garish ad trash everywhere.

dzhiurgis · 2026-02-20T07:54:57 1771574097

You must be joking. I’ve turned that off after first month of use. It’s unbearable. “Oh since you are in {place i mentioned a week ago while planning trip but ultimately didnt go} the home assistant integration question changes completely”. Or ending every answer with “since you are salesforce consultant, would you like to learn more about iron smelting?”

astrange · 2026-02-22T06:05:25 1771740325

I told Gemini I'm a software engineer and it explains absolutely everything in programming metaphors now. I think it's way undertrained with personalization.

mritchie712 · 2026-02-19T22:03:30 1771538610

It's half the price per token. Not all tokens are generated equally.

sdeiley · 2026-02-19T22:14:05 1771539245

Neither are cars but Ill take a Porsche over a Ferrari for a fraction of the price.

jmalicki · 2026-02-20T00:16:07 1771546567

What about a Porsche vs. a Toyota Camry for half the price?

ionwake · 2026-02-19T22:26:49 1771540009

which model?

sdeiley · 2026-02-19T22:29:28 1771540168

For me any, tbh. I wouldn't fit in a Ferrari lol

Svoka · 2026-02-19T21:55:23 1771538123

While price is definitely important, results are extremely important. Gemini often falls into the 'didn't do' it part of the spectrum, this days Opus almost always does 'good enough'.

Gemini definitely has its merits but for me it just doesn't do what other models can. I vibe-coded an app which recommends me restaurants. The app uses gemini API to make restaurants given bunch of data and prompt.

App itself is vibe-coded with Opus. Gemini didn't cut it.

sdeiley · 2026-02-19T22:20:08 1771539608

The binary you draw on models that havent been out a quarter is borderline insane.

Opus is absurdly good in Claude code but theres a lot of use cases Gemini is great at.

I think Google is further behind with the harness than the model

Svoka · 2026-02-20T00:19:23 1771546763

I was careful not to draw binary. I was saying that Opus in Claude Code is good enough for me to make projects. Using Gemini after it seems like a significant downgrade, which actually doesn't get the job done helping me code. This is my experience, it can change if Gemini will get better.

However, for internal use I opt to Gemini, because of API cost. It is great in sorting reviews and menues out.

1zael · 2026-02-20T00:10:36 1771546236

The order of priority for most people is: 1\ output quality 2\ latency 3\ cost. I will always pays more money if output quality is significantly better and latency is worth the tradeoff. There's also enough cost optimization strategies for applied AI applications that token cost rarely outweighs unless it's a SIGNIFICANT difference (e.x. 100-200% more).

SV_BubbleTime · 2026-02-19T21:54:53 1771538093

Well, it’s half if the product is equal.

Is it? Honestly, I still chuckle about black Nazis and the female Indian Popes. That was my first impression of Gemini, and first impressions are hard to break. I used Gemini’s VL (vision) for something and it refused to describe because it assumed it was NSFW imagery, which is was not.

I also question statis as an obvious follow up. Is Gemini equal to Opus? Today? Tomorrow? Has Google led the industry thus far and do I expect them to continue?

Counterpoint to that would be that with natural language input and output, that LLM specific tooling is rare and it is easy to switch around if you commoditize the product backend.

cyanydeez · 2026-02-19T21:55:53 1771538153

Some people like blackjack and a technical edge with card counting, others just say screw it and do slot machines.

sdeiley · 2026-02-19T22:31:10 1771540270

This is a decent analogy actually. Kudos

port11 · 2026-02-20T12:04:24 1771589064

It’s half the price for now, let them gain market traction and ser the price come up. GCP isn’t exactly affordable.

lukebechtel · 2026-02-20T03:00:40 1771556440

sonnet 4.6 is a third, and equivalent to opus 4.5, which is enough for me usually :)

EDIT: Gemini does have 1m context for "free" though so that's great.

varispeed · 2026-02-19T22:07:35 1771538855

If something is shit, it doesn't matter it costs half price of something okay.

dekhn · 2026-02-20T00:33:59 1771547639

"There is hardly anything in the world that some man cannot make a little worse and sell a little cheaper, and the people who consider price only are this man's lawful prey."

sdeiley · 2026-02-13T11:00:29 1770980429

I'm sorry but this is an insane take. Flash is leading its category by far. Absolutely destroys sonnet, 5.2 etc in both perf and cost.

Pro still leads in visual intelligence.

The company that most locks away their gold is Anthropic IMO and for good reason, as Opus 4.6 is expensive AF

fatherwavelet · 2026-02-13T12:21:46 1770985306

I think we highly underestimate the amount of "human bots" basically.

Unthinking people programmed by their social media feed who don't notice the OpenAI influence campaign.

With no social media, it seems obvious to me there was a massive PR campaign by OpenAI after their "code red" to try to convince people Gemini is not all that great.

Yea, Gemini sucks, don't use it lol. Leave those resources to fools like myself.

sdeiley · 2026-02-13T10:52:11 1770979931

3 Flash is criminally under appreciated for its performance/cost/speed trifecta. Absolutely in a category of its own.

sdeiley · 2026-02-13T10:46:41 1770979601

I predict Gemini Flash will dominate when you try it.

If you're going for cost performance balance choosing Gemini Pro is bewildering. Gemini Flash _outperforms_ Pro in some coding benchmarks and is the clear parento frontier leader for intelligence/cost. It's even cheaper than Kimi 2.5.

https://artificialanalysis.ai/?media-leaderboards=text-to-im...