Hacker Newsnew | past | comments | ask | show | jobs | submit | 7777777phil's commentslogin

Thanks for sharing doener, and thanks everyone on HN for participating. We put a lot of work into this over the past week. Here some of the key findings:

- 69% now use Claude Code as their primary AI coding tool

- 90% report productivity gains from AI assistance

- 55% spend more than 75% of their coding time with AI tools

- 86% say their usage is increasing over the past 6 months

- Adoption is uniform across experience levels — veterans with 20+ years embrace AI at the same rate as newcomers

We plan to run this on a regular basis and track evolvement over time. Obviously all data is self reported and to be taken with a grain of salt but if you want to participate in a future survey feel free to leave you contact on the website.


is this supposed to be LLM optimization or what's the primary use-case for this?

Easy copy paste of the thread metadata. Try to copy the thread title without it.

curious to hear, what your main (expected) use case is for this?

That’s just the 1st thing that ocurred to me to test it. I think what most people are hyped about it is related to give it access to your reminders, notes, notion, obsidian and then treat it like an assistant that proactively helps you by running scheduled tasks that are useful to you. That’s why some are recommending running the thing on a Mac Mini if you are in the Apple ecosystem, so it can create reminders etc.

I’ll keep playing with it on a VM and see where this goes.


I feel like HN is quite divided about that actually, A couple of days I started a survey which I plan to run monthly to see how the community feels about "LLM productivity etc". Now I have ~250 answers, need a couple more to make it significant but as of now it looks like >90% report productivity gains from AI tools - happy if you participate, only takes a minute: https://agentic-coding-survey.pages.dev/

Note that self-reporting productivity gains is a completely unreliable and unscientific metric. One study[1], small in scope but a noteworthy data point, found that over the course of the study that LLMs reduced productivity by ~20% but even after the fact the participants felt that on average their productivity had increased by ~20%. This study is surely not the end-all be-all and you could find ways to criticise it or say it doesn't apply or they were doing it wrong or whatever reason you think the developers should have had increased productivity, but the point is that people cannot accurately judge their own productivity by vibes alone.

[1] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...


If you look at the survey it's not only about productivity it's also about usage, model choice etc. But I agree with you self reported productivity gains is to be taken with a grain of salt. But then what else would you propose? The goal is to not only rely on benchmarks for model performance but develop some kind of TIOBE Index for LLMs.

The ever-present rebuttal to all LLM failure anecdotes: you're using the wrong model, you're prompting it wrong, etc. All failures are always the user's fault. It couldn't possibly be that the tool is bad.

Of course, your logic could also be equally allied to the opposite position.

Quite a few of us are tired of being told that we're imagining doing what used to take weeks multiple times in an evening.


If it generated something that saved you weeks, I think it's almost certainly because it was used for something you have absolutely zero domain understanding for and would have had to study from scratch. And I, at least, repeatedly do note that LLMs lower the barrier to entry for making proof-of-concepts. But the problem is that (1) people treat that instant gratification as a form of productivity that can replace software engineers. At most, it can make something extremely rough that is suited to one individual's very specific use case, where you mostly work around the plentiful bugs by knowing the landmines are there and not doing the behaviour that trips them; and (2) people spam these low-effort proof-of-concepts, which have no value to other people on account of how rough and lacking in ability to be extended to cover more than one person's use case they are, and this drowns out the content people actually put effort into.

LLMs, when used like this, do not increase productivity on making software worth sharing with other people. While they can knock out the proof-of-concept, they cannot build it into something valuable to anyone but the prompter, and by shortcircuiting the learning process, you do not learn the skills necessary to build upon the domain yourself, meaning you still have to spend weeks learning those skills if you actually want to build something meaningful. At least this is true for everything I have observed out of the vibe-coding bubble thus far, and my own extensive experiences trying to discover the 10x boost I am told exists. I am open to being shown something genuinely great that an LLM generated in an evening if you wish to share evidence to the contrary.

There is also the question of the provenance of the code, of course. Could you have saved those weeks by simply using a library? Is the LLM saving you weeks by writing the library ""from scratch"", in actuality regurgitating code from an existing library one prompt at a time? If the LLM's productivity gain is that it normalized copying and pasting open-source code wholesale while calling it your own, I don't think that's the great advancement for humanity it is portrayed as.


I find your persistent, willful bullheadedness on this topic to be exhausting. I'd say delusional, but I don't know you and you're anonymous so I'm probably arguing with an LLM in someone's sick social experiment.

A few weeks ago I brought up a new IPS display panel that I've had custom made for my next product. It's a variant of the ST7789. I gave Opus 4.5 the registers and it produced wrapper functions that I could pass to LVGL in a few minutes, requiring three prompts.

This is just one of countless examples where I've basically stopped using libraries for anything that isn't LVGL, TinyUSB, compression or cryptography. The purpose built wrappers Opus can make are much smaller, often a bit faster, and perhaps most significantly not encumbered with the mental model of another developer's assumptions about how people should use their library. Instead of a kitchen sink API, I/we/it created concise functions that map 1:1 to what I need them to do.

I happen to believe that you're foolish for endlessly repeating the same blather about "vibe coding" instead of celebrating how amazing what you yourself said about lowering the barrier to entry for domains that are extremely rough and outside of their immediate skillset actually is and the incredible impact it has on project trajectory, motivation and skill-stacking for future projects.

Your [projected] assumption that everyone using these tools learns nothing from seeing how problems can be solved is painfully narrow-minded, especially given than anyone with a shred of intellectual curiosity quickly finds that they can get up to speed on topics that previously seemed daunting to impossible. Yes, I really do believe that you have to expend effort to not experience this.

During the last few weeks I've built a series of increasingly sophisticated multi-stage audio amplifier circuits after literal decades of being quietly intimidated by audio circuits, all because I have the ability to endlessly pepper ChatGPT with questions. I've gone from not understanding at all to fully grasping the purpose and function of every node to a degree that I could probably start to make my own hybrids. I don't know if you do electronics, but the disposition of most audio electronics types does not lend itself to hours of questions about op-amps.

Where do we agree? I strongly agree that people are wasting our time when they post low-effort slop. I think that easy access to LLMs shines a mirror on the awkward lack of creativity and good, original ideas that too many people clearly [don't] have. And my own hot take is that I think Claude Code is unserious. I don't think it's responsible or even particularly compelling to get excited about making never looking at the code as a goal.

I've used Cursor to build a 550k+ LoC FreeRTOS embedded app over the past six months that spans 45 distinct components which communicate via a custom message bus and event queue, juggling streams from USB, UART, half a dozen sensors, a high speed SPI display. It is well-tested, fully specified and the product of about 700 distinct feature implementation plan -> chat -> debug loops. It is downright obnoxious reading the stuff you declare when you're clearly either doing it wrong or, well, confirmation of the dead internet theory.

I honestly don't know which is worse.


Really like the ux of that survey - super easy to fill out, is it just a custom web form or you used a library?

Yes exactly, it's a standalone cloudflare page with some custom html/css that writes to a D1 (Cloudflare SQL DB) for results and rate limits, thats's it. I looked at so many survey tools but none offered what I was looking for (simple single page form, no email, no signup, no tracking) so I built this (with claude) Thanks for the feedback!

This week I started to create a "TIOBE Index" for AI Agents. I'd really like to run a regular poll on HN that keeps track of which AI coding agents are actually being used by this community, rather than just which ones are winning benchmarks. I hope the results will provide a high-level view of what people are using in production and how that shifts month-over-month.

In the last 12 hours we had 225 submissions (Margin of Error ±6.2%) and need ~160 more to reach statistical significance (n=385, MOE ±5%). No tracking, email optional (only for results). I will post the results + the "January Index" here once we hit the threshold.

It just takes one minute of your time. Thank you for participating!!


imo it isn’t any tool, it’s institutions: shared rules like property, contracts, and science that let billions of strangers coordinate, because without them none of the other mentioned inventions would scale



[666]


This is super cool, left a comment, nothing more to say!

Earlier this month I argued why LLMs need episodic memory (https://philippdubach.com/posts/beyond-vector-search-why-llm...), and this lines up closely with what you’re describing..

But not sure it's a prompts vs rules problem. It’s more about remembering past decisions as decisions. Things like 'we avoided this dependency because it caused trouble before' or 'this validation exists due to a past incident' have sequence and context. Flattening them into embeddings or rules loses that. I see even the best models making those errors over a longer context rn.

My current view is that humans still need to control what gets remembered. Models are good at executing once context is right, but bad at deciding what deserves to persist.


Humans decide what to remember based on their emotions. The LLM’s don’t have emotions. Our definition of good and bad comes from our morals and emotions.

In the context of software development; requirements are based on what we wanna do (which is based on emotion), the methods we choose to implement it are also based mostly on our predictions about what will work and not work well.

Most of our affinity for good software development hygiene comes from emotional experiences of the negative feelings felt from the extra work of bad development hygiene.

I think this explains a lot of varied success with coding agents. You don’t talk to them like you talk to an engineer because with an engineer, you know that they have a sense of what is good and bad. Coding agents won’t tell you what is good and bad. They have some limited heuristics, but they don’t understand nuance at all unless you prompt them on it.

Even if they could have unlimited context, window and memory, they would still need to be able to which part of that memories is important. I.e. if the human gave them conflicting instructions, how do they resolve that?.

I eventually think we’ll get to a state where a lot of the mechanics of coding and development can be incorporated into coding agents, but the what and why we build will still come from a human. I.e. will be able to do from 0 to 100% by itself a full stack web application, including deployment with all the security compliance and logins and whatever else, but it still won’t know what is important to emphasize in that website. Should the images be bigger here or the text? Questions like that.


Interesting framing, but I think emotions are a proxy for something more tractable: loss functions over time. Engineers remember bad hygiene because they've felt the cost. You can approximate this for agents by logging friction: how many iterations did a task take, how many reverts, how much human correction. Then weight memory retrieval by past-friction-on-similar-tasks. It's crude, but it lets the agent "learn" that certain shortcuts are expensive without needing emotions. The hard part is defining similarity well enough that the signal transfers. Still early, but directionally this has reduced repeat mistakes in our pipeline more than static rules did.

How do you choose which loss function over time to pursue?

Honestly, it's empirical. We started with what was easiest to measure: human correction rate. If I had to step in and fix something, that's a clear signal the agent took a bad path. Iterations and reverts turned out to be noisier -- sometimes high iteration count means the task was genuinely hard, not that the agent made a mistake. So we downweighted those. The meta-answer is: pick the metric that most directly captures "I wish the agent hadn't done that." For us that's human intervention. For a team with better test coverage, it might be test failures after commit. For infra work, maybe rollback frequency. There's no universal loss function — it depends on where your pain actually is. We just made it explicit and started logging it. The logging alone forced clarity.

I just started something like that, haven’t shared it widely yet, but here we go - happy if you participate: https://agentic-coding-survey.pages.dev/

Add vscode. Add a list of models, since many tools allow you to select which model you use.

Thanks for the feedback. I thought there are just too many models and versions to list them all. For now, if you select "other" you get a text field to add any model not listed, hope this helps.

You should add OpenAI Codex CLI.

Thanks for the feedback, I'll do that. For now, if you select "other" you get a text field to add any model not listed..

Any chance you'll add Antigravity and Jetbrains Junie? I've been using almost nothing but those for the last month. Antigravity at home, Junie at work.

Done, upon popular demand I added Antigravity, Codex CLI, and Junie

Thanks!

> Q5. For which tasks do you use AI assistance most?

This is really tough for me. I haven't done a single one of those mostly-manually over the last month.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: