my takeaway from this is that it should now be MANDATORY to have an LLM do a scan on the entire codebase prior to release or artifact creation. do NOT use third party plugins for this. it's so easy to create your own github action to digest the whole codebase and inspect third party code. it costs tokens yes but it's also cached and should be negligible spend for the security it brings.
Ironically, Trivy was the first known compromised package and its purpose is to scan container images to make sure they don't contain vulnerabilities. Kinda like the LLM in your scenario.
> The problem for me is that the development practices of the people that are working on it are suboptimal at best; they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things (or even build a proper list of changes for each release), and they add, remove, refine, change, fix, and break features constantly at that accelerated pace.
this is what i notice with openclaw as well. there have been releases where they break production features. unfortunately this is what happens when code becomes a commidity, everyone thinks that shipping fast is the moat but at the expense of suboptimality since they know a fix can be implemented quickly on the next release.
Openclaw has 20k commits, almost 700k lines of code, and it is only four months old. I feel confident that that sort of code base would have a no coherent architecture at all, and also that no human has a good mental model of how the various subsystems interact.
I’m sure we’ll all learn a lot from these early days of agentic coding.
> I’m sure we’ll all learn a lot from these early days of agentic coding.
So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average. Depressingly.
I think what we're seeing is a phase transition. In the early days of any paradigm shift, velocity trumps stability because the market rewards first movers.
But as agents move from prototypes to production, the calculus changes. Production systems need:
- Memory continuity across sessions
- Predictable behavior across updates
- Security boundaries that don't leak
The tools that prioritize these will win the enterprise market. The ones that don't will stay in the prototype/hobbyist space.
We're still in the "move fast" phase, but the "break things" part is starting to hurt real users. The pendulum will swing back.
This makes sense. Development velocity is bought by having a short product life with few users. As you gain users that depend on your product, velocity must drop by definition.
The reason for this is that product development involves making decisions which can later be classified as good or bad decisions.
The good decisions must remain stable, while the bad decisions must remain open to change and therefore remain unstable.
The AI doesn't know anything about the user experience, which means it will inevitably change the good decisions as well.
> So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average.
Only for the non-pro users. After all, those users were happy to use excel to write the programs.
What we're seeing now is that more and more developers find they are happy with even less determinism than the Excel process.
Maybe they're right; maybe software doesn't need any coherence, stability, security or even correctness. Maybe the class of software they produce doesn't need those things.
I still use excel to write programs. I use officescript and power query. I shy away from via but have also used it.. I’m not sure what your point is. The people stopping citizens’ development could ease off the job security lines and the deferral to lockdown
20 for me, and let's not exaggerate. We've given lip service to it this entire time. Hell look at any of the corps we're talking about (including where I work) and they're demanding "velocity without lowering the quality bar", but it's a lie: they don't care about the quality bar in the slightest.
One of my main lessons after a decent long while in security, is that most orgs care about security, *as long as it doesn't get in the way of other priorities* like shipping new features. So when we get something like Agentic LLM tooling where everything moves super fast, security is inevitably going to suffer.
I’m learning that projects, developed with the help of agents, even when developers claim that they review and steer everything, ultimately are not fully understood or owned by the developers, and very soon turns into a thousand reinvented wheels strapped together by tape.
> very soon turns into a thousand reinvented wheels strapped together by tape.
Also most of the long running enterprise projects I’ve seen - there was one that had been around for like 10 years and like about 75% of the devs I hadn’t even heard of and none of the original ones were in the project at all.
The thing had no less than three auditing mechanisms, three ways of interacting with the database, mixed naming conventions, like two validation mechanisms none of which were what Spring recommended and also configurations versioned for app servers that weren’t even in use.
This was all before AI, it’s not like you need it for projects to turn into slop and AI slop isn’t that much different from human slop (none of them gave a shit about ADRs or proper docs on why things are done a certain way, though Wiki had some fossilized meeting notes with nothing actually useful) except that AI can produce this stuff more quickly.
When encountered, I just relied on writing tests and reworking the older slop with something newer (with better AI models and tooling) and the overall quality improved.
Claude Code breaks production features and doesn't say anything about it. The product has just shifted gears with little to no ceremony.
I expect that from something guiding the market, but there have been times where stuff changes, and it isn't even clear if it is a bug or a permanent decision. I suspect they don't even know.
We're still in the very early days of generative AI, and people and markets are already prioritizing quality over quantity. Quantity is irrelevant when it comes value.
All code is not fungible, "irreverent code that kinda looks okay at first glance" might be a commodity, but well-tested, well-designed and well-understood code is what's valuable.
and once you've got your wish: ugly code without tests or a way to comprehend it, but cheap!
How much value are you going to be able to extract over its lifetime once your customers want to see some additional features or improvements?
How much expensive maintenance burden are you incurring once any change (human or LLM generated) is likely to introduce bugs you have no better way of identifying than shipping to your paying customers?
Maybe LLM+tooling is going to get there with producing a comprehensible and well tested system but my anectodal experience is not promising. I find that AI is great until you hit its limit on a topic and then it will merrily generate tokens in a loop suggesting the same won't-work-fix forever.
What you wrote aligns with my experience so far.
It's fast and easy to get something working, but in a number of cases it (Opus) just gets stuck 'spinning' and no number of prompts is going to fix that.
Moreover - when creating things from scratch it tends to use average/insecure/ inefficient approaches that later take a lot of time to fix.
The whole thing reminds me a bit of the many RAD tools that were supposed to 'solve' programming. While it was easy to start and produce something with those tools, at some point you started spending way too much time working around the limitations and wished you started from scratch without it.
I'm of the opinion that the diligence of experts is part of what makes code valuable assets, and that the market does an alright job of eventually differentiating between reliable products/brands and operations that are just winging it with AI[1].
I would think that the better the code is designed and factored and refactored, the easier it is to maintain and evolve, detect and remove bugs and security vulnerabilties from it. The ease of maintenance helps both AI and humans.
There are limits to what even AI can do to code, within practical time-limits. Using AI also costs money. So, easier it is to maintain and evolve a piece of software, the cheaper it will be to the owners of that application.
It's understandable and even desirable that a new piece of code rapidly evolves as they iterate and fix bugs. I'd only be concerned if they keep this pattern for too long. In the early phases, I like keeping up with all the cutting edge developments. Projects where dev get afraid to ship because of breaking things end up becoming bloated with unnecessary backward compatibility.
one thing it did massively for me was save me time from questions like should i go with X or Y option questions. before i used to just think longer about tradeoffs but with AI it became a lot faster. no more procrastination due to decision fatigue.
> I simply booted up a VM with an H100, ssh’d into it with Cursor, and prompted the agent to set up an inference server that I could ping from my web generation app. What used to take hours or days of painful, slow debugging now takes literally minutes.
an awesome takeaway from this is that self-hosted models are the future! can't wait for hardware to catch up and we can do much more experiments on our laptops!
i really think this is part of the pitch deck for bun's funding. that a bigger company would acquire it for the technology. the only reason an AI company or any company for that matter would acquire it would be to:
i like how claude code currently does it. it asks permission for every command to be ran before doing so. now having a local model with this behavior will certainly mitigate this behavior. imagine before the AI hits the webhook.site it asks you
AI will visit site webhook.site..... allow this command?
1. Yes
2. No
a concern i have is that it's only a matter of time before a similar attack is done to electron based apps (which also have packages installed using npm). probably worse because it's installed in your computer and can potentially get any information especially given admin privileges.
I’m starting an electronjs project in a few weeks and have been reading up on it. They make a big deal about the difference between the main and renderer processes and security implications. The docs are there and the advice given but it’s up to the developers to follow them.
That leads me to another point. Devs have to take responsibility for their code/projects. Everyone wants to blame npm or something else but, as software developers, you have to take responsibility for the systems you build. This means, among may other things, vetting code your code depends on and protecting the system from randomly updating itself with code you haven’t even heard about.
We wanted to make concept for an app using all local models for chat (llama 3.1 8B) and voice (whisper). Deployed using kubernetes and easily scalable not to mention fully open source!
In practice, it’s been written as plain JS with a tiny bit of gratuitous Vue and SCSS bolted on (see even how Vue’s onMounted and onBeforeUnmount are fed callbacks that just run the actual initOGL and destroy functions). It would have been easier and shorter to write without Vue and SCSS than with them! What’s currently spread across index.html, src/styles.scss, src/main.js and src/App.vue would have worked better all in index.html, or if you really wanted to, you could still split the CSS and JS into files src/styles.css and src/main.js.
The bulk of it is WebGL. Vue is doing very little here. Since it's a single static page rendering to canvas, it really doesn't need a framework like Vue or React.
I was using React at work and looking at the Vue manual which at first looked good to me because Vue has first-class lists, it fit my model of web applications better. Than I saw three.js and other things were people used React to render things that weren’t ordinary web apps and I realized I could draw anything I could imagine with React but not with Vue.
I like the idea of svelte but for apps that are small enough that svelte has a bundle size advantage the bundle size difference isn’t decisive (users won’t usually notice or care) and if your app is huge enough that the bundle size is a problem you have problems bigger than your choice of framework.
I helped port a Vue 2 project to Vue 3, and then I worked on a Vue 3 project we’ve slowly been rewriting in a greenfield Nuxt 3 project. Vue 2 and the options API were just difficult in all senses - even Vue3 with Options feels bad. I really enjoy 3 with the composition API, and I have always had a difficult time reasoning about React personally.
While I will continue to probably promote Vue where it makes sense, I’m honestly more inclined towards learning Svelte, HTMX, and other less arduous frameworks.
but seriously, I'm very interested to hear your gripes with Vue that were solved by react, since the latter feels much worse DX-wise than both Vue or Svelte, notwithstanding worse performance as well.
Vue 2 had really bad support for static typing. It's improved in Vue 3 but still not as good as React. TSX is especially good.
But the main issue is the automatic reactivity. It's difficult to reason about and leads to spaghetti code. We also had occasional issues where people would put objects in properties that had some tenuous link to a database object or something, and Vue recursively infects the entire object with getters and setters to make the reactivity work. Sometimes we didn't even notice but it makes everything way slower.
I haven't tried Svelte so I'll take your word for it!
Also this was 3 years ago so I may have misremembered some details. No nitpicking!
reply