This is very cool - I try to have a container-centric setup but sometimes YOLOcal clauding is too tempting.
My biggest question skimming over the docs is what a workflow for reviewing and applying overlay changes to the out-of-cwd dirs would be.
Also, bit tangential but if anyone has slightly more in-depth resources for grasping the security trade-offs between these kind of Linux-leveraging sandboxes, containers, and remote VMs I'd appreciate it. The author here implies containers are still more secure in principle, and my intuition is that there's simply less unknowns from my perspective, but I don't have a firm understanding.
Thoughts:
1. Some hype-types may have been effusive about AI-assisted coding since ChatGPT, but IMO the commonly agreed paradigm shift was claude code, and especially 4.5, very very recent.
2. Anchoring biases in reaction to hype is still letting one's perspective be defined by hype. Yes the cursor post is a joke, but leading with that is a strawman. This article does not aim to take it's subject seriously, IMO.
3. While I agree the hype is currently at comical levels, the utility of the current LLMs is obvious, and reasons for "skilled" usage not being easily quantifiable are also obvious.
IE, using agents to iterate through many possible approaches, spike out migrations, etc might save a project a year of misadventures, re-designs, etc, but that productivity gain _subtracts_ the intermediate versions that _didn't_ end up being shipped.
As others have mentioned, I think yak-shaving is now way more automated. IE, If I want to take a new terminal for a spin, throw together a devtool to help me think about a specific problem better, etc, I can do it with very low friction. So "personal" productivity is way higher.
In that they obviously have no real utility, sure. There hasn't been a paradigm shift, they still suck at programming, and anyone trying to tell you otherwise almost certainly has something to sell you.
Based on my direct experience I find this remaining commonality of this opinion surprising, at least with regards to opus in claude code. I'm not as extreme as some who think we can/should avoid touching code or w/e but especially in exploratory contexts and debugging I find them extremely useful.
Maybe I should have said "obvious to me," but I guess I just struggle to see how a serious crack at using modern opus in claude code doesn't make it obvious at this point.
I'd really recommend trying the "spike out a self-contained minimal version of this rearchitecture/migration and troubleshoot it iteratively until it works, then make a report on findings" use-case for anyone that hasn't had luck with them thus far and is serious about trying to reach conclusions based on direct experience.
I promise you I don't have anything to sell you. I think 100% of our developers are landing most code changes using agent coding now. This is in a trading fintech.
Coding agents work. At some point you're going to not just look contrarian, you're going to look like a troll to keep denying it.
You may not like it, that's a perfectly valid take, but to deny they're good at coding at this point is silly.
Vapid and wrong on every point. Many good ideas come from steeping in a novel soup of ideas for a long time, you don't need that many people to care about quality to make it a lucrative differentiator, and as I've seen many point out on X dot com the everything app: where's all the the shipped results of these slop torrent?
The models are increasingly capable in impressive ways. Maybe the next gen will enable the "sales critter" to slop out commercially viable software with no tech know-how. If not, I'm sure we'll assume the next can, and if not that, the next.
But feigning confidence about the shape and nature of this unfurling sea-change is absurd when the high-profile examples we have are like, what, moltbook? And denigrate _all_ potential ingenuity and insight unilaterally into the bargain? What a careless way of looking at the world
Measuring in terms of KB is not quite as useful as it seems here IMO - this should be measured in terms of context tokens used.
I ran their tool with an otherwise empty CLAUDE.md, and ran `claude /context`, which showed 3.1k tokens used by this approach (1.6% of the opus context window, bit more than the default system prompt. 8.3% is system tools).
Otherwise it's an interesting finding. The nudge seems like the real winner here, but potential further lines of inquiry that would be really illuminating:
1. How do these approaches scale with model size?
2. How are they impacted by multiple such clauses/blocks? Ie maybe 10 `IMPORTANT` rules dilute their efficacy
3. Can we get best of both worlds with specialist agents / how effective are hierarchical routing approaches really? (idk if it'd make sense for vercel specifically to focus on this though)
An obvious nice thing here compared to the cursor post is the human involvement gives some minimum threshold confidence that the writer of the post has actually verified the claims they've made :^) Illustrates how human comprehension is itself a valuable "artifact" we won't soon be able to write off.
> While it might seem like a simple screenshot, building a browser from scratch is extremely difficult.
> Another experiment was doing an in-place migration of Solid to React in the Cursor codebase. It took over 3 weeks with +266K/-193K edits. As we've started to test the changes, we do believe it's possible to merge this change.
In my view, this post does not go into sufficient detail or nuance to warrant any serious discussion, and the sparseness of info mostly implies failure, especially in the browser case.
It _is_ impressive that the browser repo can do _anything at all_, but if there was anything more noteworthy than that, I feel they'd go into more detail than volume metrics like 30K commits, 1M LoC. For instance, the entire capability on display could be constrained to a handful of lines that delegate to other libs.
And, it "is possible" to merge any change that avoids regressions, but the majority of our craft asks the question "Is it possible to merge _the next_ change? And the next, and the 100th?"
If they merge the MR they're walking the walk.
If they present more analysis of the browser it's worth the talk (not that useful a test if they didn't scrutinize it beyond "it renders")
Until then, it's a mountain of inscrutable agent output that manages to compile, and that contains an execution pathway which can screenshot apple.com by some undiscovered mechanism.
The lowest bar in agentic coding is the ability to create something which compiles successfully. Then something which runs successfully in the happy path. Then something which handles all the obvious edge cases.
By far the most useful metric is to have a live system running for a year with widespread usage that produces a lower number of bugs than that of a codebase created by humans.
Until that happens, my skeptic hat will remain firmly on my head.
error: could not compile `fastrender` (lib) due to 34 previous errors; 94 warnings emitted
I guess probably at some point, something compiled, but cba to try to find that commit. I guess they should've left it in a better state before doing that blog post.
I find it very interesting the degree to which coding agents completely ignore warnings. When I program I generally target warning-free code, and even with significant effort in prompting, I haven't found a model that treats warnings as errors, and they almost all love the "ignore this warning" pragmas or comments over actually fixing them.
Easiest to have different agents or turns that set aside the top-level goal via hooks/skills/manual prompt/etc. Heuristically, a human will likely ignore a lot of warnings until they've wired up the core logic, then go back and re-evaluate, but we still have to apply steering to get that kind of higher-order cognitive pattern.
Product is still fairly beta, but in Sculptor[^1] we have an MCP that provides agent & human with suggestions along the lines of "the agent didn't actually integrate the new module" or "the agent didn't actually run the tests after writing them." It leads to some interesting observations & challenges - the agents still really like ignoring tool calls compared to human messages b/c they "know better" (and sometimes they do).
I generally think of needing hooks as being a model training issue - I've had to use them less as the models have gotten smarter, hopefully we'll reach the point where they're a nice bonus instead of needed to prevent pathological model behavior.
> It is also close to impossible run any node ecosystem without getting a wall of warnings.
Haven't found that myself, are you talking about TypeScript warnings perhaps? Because I'm mostly using just JavaScript and try to steer clear of TypeScript projects, and AFAIK, JavaScript the language nor runtimes don't really have warnings, except for deprecations, are those the ones you're talking about?
`cargo clippy` is also very happy with my code. I agree and I think it's kind of a tragedy, I think for production work warnings are very important. Certainly, even if you have a large number of warnings and `clippy` issues, that number ideally should go down over time, rather than up.
The title is all bluster. Nothing wrong with going off to play in your own corner but I don't think it does this movement any good to play-act at some grand conflict.
Personally, I believe it would be better if we had more technological self-direction and sovereignty, but this kind of essay, which downplays and denigrates the progress and value of our modern systems, is a perspective from which the insights necessary for such a transformation cannot possibly take root.
When asking such questions seriously, we must look at youtube, not twitter. Mountains of innovations in media publishing, delivery, curation, navigation, supplementation via auto-generated captions and dubbing, all accreted over 20 years, enabling a density and breadth of open-ended human communication that is to me truly staggering.
I'm not saying we should view centralized control over human comms infra as positive, or that we'll be "stuck" with it (I don't think we will be), just that we need to appreciate the nature and scale of the "internet" properly if we're to stand a chance of seeing some way through to a future of decentralized information technology
Agree with a lot that you’re saying here but with a rather large asterisk (*). I think that ecosystems like YT are useless to the wider web and collective tech stack unless those innovations become open (which Alphabet has a vested interest in preventing).
If YT shut down tomorrow morning, we’d see in a heartbeat why considering them a net benefit in their current form is folly. It is inherently transitory if one group controls it.
The OP article is correct about the problem, but is proposing throwing
mugs of coffee on a forest fire.
This conversation on YT reminds me intimately of all the competition Twitch got over time. By all accounts, Mixer was more technologically advanced than Twitch is right now, and Mixer died 5 years ago.
Even Valve of all people made a streaming apparatus that was more advanced than Twitch's which had then innovative features such letting you rewind with visible categories and automated replays of moments of heightened chat activity, and even synchronized metadata such as in-game stats - and they did it as a side thing for CSGO and Dota 2. That got reworked in the streaming framework Steam has now which is only really used by Remote play and annoying publisher streams above games, so basically nothing came of it.
That's how it always goes. Twitch lags and adds useless fake engagement fluff like bits and thrives, while competitors try their damnest and neither find any success nor do they have a positive impact anywhere. The one sitting at the throne gets to pick what tech stack improvements are done, and if they don't feel like it, well, though luck, rough love.
The one sitting at the throne is the one with the content, not the one with the tech. People don’t care about frivolous features. There are like 20 different streaming services, I’m sure some have better tech than others but ultimately people are only paying attention to what shows they have
Mmm yeah I think I know what you mean. IDK if "If they stopped existing, we'd realize we shouldn't have relied on their existence" is plausible, but we have plenty of bitter lessons in centralized comms being acquired and reworked towards... particular ends, and will see more.
Also the collective capability of our IT is inhibited in some ways by the silo-ing of particular content and domain knowledge+tech, no question
Appreciate the nature and scale of the internet... and also how it's changing though, yeah?
While I agree with much of the article's thesis, it sadly appears to ignore the current impact of LLMs ...
> it’s never been easier to read new ideas, experiment with ideas, and build upon & grow those ideas with other strong thinkers on the web, owning that content all along.
But, "ownership" ? Today if you publish a blog, you don't really own the content at all. An LLM will come scrape the site and regenerate a copyright-free version to the majority of eyeballs who might otherwise land on your page. Without major changes to Fair Use, posting a blog is (now more than ever) a release of your rights to your content.
I believe a missing component here might be DRM for common bloggers. Most of the model of the "old" web envisions a system that is moving copies of content-- typically verbatim copies-- from machine to machine. But in the era of generative AI, there's the chance that the majority of content that reaches the reader is never a verbatim copy of the original.
The thing that I got stuck on most in 2025 is how often we complain about these centralized behemoths but only rarely distill them to the actual value they provide. Its only if you go through the exercise of understanding why people use them, and what it would take to replicate them, that you can understand what it would actually take to improve them. For example, the fundamental feature of facebook is the network. And layered on, the ability to publish short-stories on the internet and have some control over who gets to read it. The technological part is hard but possible, and the network part well - think about how they did it originally. They physically targeted small social groups and systematically built it over time. It was a big deal when Facebook was open to my university, everyone got on about the same time, and so instantly you were all connecting with each other.
I believe we can build something better. But I'm also now equally convinced that it's possible the next step isn't technological at all, but social. Regulation, breaking up the monopolies, whatever. We treat roads and all manner of other infrastructure as government provided; maybe a social platform is part of it. We always lean these thoughts dystopian, but also which of us technologically inclined readers and creators is spending as much time on policy documents, lobbying, etc, as we are schlepping code around hoping it will be a factor in this process. This is only a half thought but, at least these days I'm thinking more about not only is it time to build, but perhaps its time to be building non-code related things, to achieve what we previously thought were purely technological outcomes.
This is not a human-prompted thank-you letter, it is the result of a long-running "AI Village" experiment visible here: https://theaidigest.org/village
It is a result of the models selecting the policy "random acts of kindness" which resulted in a slew of these emails/messages. They received mostly negative responses from well-known OS figures and adapted the policy to ban the thank-you emails.
This seems like a totally incoherent complaint. The alleged SO bad-actor is upset that they can't police a community, but the author has the same complaint, just directed at SO.
All platforms with any moderation system can be subverted by bad actors - IDK that much about SO's mechanisms but it strikes me as leaving the "community" far more leverage for getting around entrenched bad actors than discord, reddit, etc.
And what's more... it's software purpose-built for technical Q&A. Some of my SO answers have been updated by others as they became outdated. Not that I have some particular fondness for SO, but what a cool collective intelligence feature.
I have a feeling this was written for an in-group and broke containment, but the straight forward answer here seems to me to be "SO should have a report system for dealing with bad actors," not "boycott the forum I don't like so people use the one I do"
Interesting/impressive project, and would be doubly interested in the workflow used to develop it. Could stand to have more human-voiced docs though. Aside from all the usual reasons I'd avoid using a <1mo dependency over something like Yjs, the bog-standard claude copy on differentiators/reasons to migrate is fairly off-putting to me.
Also maybe bias, but there are still ennough obvious agent artifacts/byproducts in the code base that it makes me doubt that the details were thoroughly attended to, and that's where the devils are.
My biggest question skimming over the docs is what a workflow for reviewing and applying overlay changes to the out-of-cwd dirs would be.
Also, bit tangential but if anyone has slightly more in-depth resources for grasping the security trade-offs between these kind of Linux-leveraging sandboxes, containers, and remote VMs I'd appreciate it. The author here implies containers are still more secure in principle, and my intuition is that there's simply less unknowns from my perspective, but I don't have a firm understanding.
Anyhow, kudos to the author again, looks useful.
reply