I've found this to be true so far, junior engineers with AI can be super productive but they can also cause a lot of damage (more outages than ever) and AI amplifies the sometimes poorly designed code they can generate.
I suspect a lot of it best practices will be enforcing best practices via agents.md/claude.md to create a more constrained environment.
I’ve observed radically different workflows amongst senior candidates vs junior candidates when using an ai. A senior candidate will often build an extremely detailed plan for the agent - similar to how you would do a design for/with a junior engineer. Then let the agent go full throttle to implement the plan and review the result.
Juniors seem to split into the category of trust everything the ai says, or review every step of the implementation. It’s extremely hard to guide the ai while you are still learning the basics, opus4.6 is a very powerful model.
My observation has been that there are a lot of personal styles to engaging with the LLMs that work, and "hold the hand" vs "in-depth plan" vs "combination" doesn't really matter. There is some minimum level of engagement required for non-trivial tasks, and whether that engagement comes mid-development, at the early design phase, or after isn't really that big of a deal. Eg; "Just enough planning" is a fine way of approaching the problem if you're going to be in the loop once the implementation starts.
I don't claim to have any special skill at AI, but as a 'senior' dev, my strategy is exactly the opposite. I try to be as lazy, dumb and concise as I can bring myself to be with my initial prompt, and then just add more detail for the bits that the AI didn't guess correctly the first time around.
Quite often the AI guesses accurately and you save the time you'd have spent crafting the perfect prompt. Recently, my PM shared a nigh-on incompressible hand-scribbled diagram on Slack (which, in fairness, was more or less a joke). I uploaded it to Gemini with the prompt "WTF does this diagram mean?". Even without a shred of context, it figured out that it was some kind of product feature matrix and produced a perfect three paragraph summary.
I've never really seen the value in the planning phase as you're free to just throw away whatever the AI produces and try again with a different prompt. That said, I don't pay for my tokens at work. Is planning perhaps useful as a way of reducing total token usage?
It's more about the size of the task I try to do, it's quite possible to get opus4.6 to one shot a "good" 30k loc change with the right planing doc. I'm not confident I could get similar results handholding. I also tend to want to know the major decisions and details in such a change up front rather than discovering them post-hoc.
One-shotting 30k LOC is no use to me because none of my colleagues will review a PR that big. But more to the point, I wouldn't be able to review it either, and I think we're still at a point where you do want to look carefully at the output these tools generate.
My take: openclaw should not run on a mac (even though looking at the skills it ships with it clearly was made to)
It should run on its own VPS with full root access, given api keys which have spending limits, given no way for strangers to talk to it. I treat it as a digital assistant, a separate entity, who may at some point decide to sell me out as any human stranger might, and then share personal info under that premise.
Just uses claude. I haven't tried it much but it seems to be what you're describing.
Openclaw uses pi agent under the hood. Arguably most of the codebase could be replaced by systemd if you're running on a VPS for scheduling though, and then its a series of prompts on top of pi agent.
Boris has been very open about the 100% AI code writing rate and my own experience matches. If you have a typescript or common codebase, once you set your processes up correctly (you have tests / verification, you have a CLAUDE or AGENTS.md that you always add learnings to, you add skill files as you find repeatable tasks, you have automated code review), its not hard to achieve this.
Then the human touch points become coming up with what to build, reviewing the eng plans of the AI, and increasingly light code review of the underlying code, focusing on overall architectural decisions and only occasionally intervening to clean things up (again with AI)
I didnt like how married to git hooks beads was, so I made my own thats primarily a SQLite workhorse. Been using it just the same as I have used Beads, works just as good, drastically less code. I added a concept called "gates" to stop the AI model from just closing tasks without any testing or validation (human or otherwise) because it was another pain point for me with Beads.
Works both ways, to GitHub, from GitHub. When you claim a task, its supposed to update on GitHub too (though looking at the last one I claimed, doesnt seem to be 100% foolproof yet).
Why have the issues tracked on the file system at all if it can be represented in Github issues and be accessed via tool calls? Also how are you handling sub task management, are you able to link closed checkboxes in the Github issues or?
> Singapore’s free speech restrictions, whatever you think of them, no longer seem so far outside the box. Trump is suing plenty of people. The UK is sending police to knock on people’s doors for social media posts, and so on. That too makes Singapore more of a “normal country"
That seems like it should make Singapore _more_ cool, at least my personal theory is that this changed a lot of perception of China (at least in some parts of gen z social media, "it's a very Chinese time").
Hmm, I once transited in Heathrow in a return flight from europe to the US and had to go through Heathrow security for whatever reason, where they subjected me to liquids rules way stricter than either my source or destination did.
E.g. 1 day use contact lenses and prescription creams all having to fit in a tiny plastic bag. So I'm happy for this change.
> Hmm, I once transited in Heathrow in a return flight from europe to the US and had to go through Heathrow security for whatever reason,
The US mandates that you have to go through TSA approved security before getting on a flight to the US.
Either the security at your European airport wasn't good enough, or the transit at Heathrow allowed you to access to things that invalidated the previous security screening and so it had to be done again.
The bonus is that if you get to go through US Immigration at the departure airport then you can often land at domestic terminals in the US and the arrivals experience is far less tortuous. I flew to the US with a transit in Ireland a few times and it was so much nicer using the dead time before the Ireland -> US flight to clear immigration rather than spending anything from 15 minutes to 4 hours in a queue at the arrival airport in the US (all depending on which other flights arrived just before yours).
I suspect a lot of it best practices will be enforcing best practices via agents.md/claude.md to create a more constrained environment.