Very similar to what I'm doing, but I take a more unsupervised approach, then in...

Very similar to what I'm doing, but I take a more unsupervised approach, then introduce as much determinism as possible in the right layers, while cutting out as much magic as possible.

One thing these sort of loops gloss over though, is that _plans change_ during implementation and I've never seen anyone talking about how they address this.

With pre-generated detailed implementation plan for an entire feature or a subsystem, the model will try to stick to the plan when reality has changed during implementation. So I would advice against a detailed plan. Plan and tasks within it are only valid for a single task for a feature. When codebase, i.e. reality changes - plan has to be regenerated.

The only thing that is immutable and untouchable during implementation is a high level spec.md file that lists goals and non-goals. I spend 1-2 hours specing a subsystem together in an interactive session with Opus, have it write a spec.md file, then a simple ralph loop. So it ends up:

1. Spec.md - interactive, extremely human-in-the-loop, high level objectives with acceptance criteria

Loop starts: 2. Plan - opus is instructed to read all specs using subagent to get general understanding. Then read the spec that we're currently working on. It will also read codebase to see what's the current state, what are the last changes in git, and what does log.md for the spec contain. Then it has to read the previous plan.md file and REWRITE it entirely, putting the most important next task at the top. It will have to log what it has done in specname_log.md (just a few lines max, no more)

3. Build: this is basically instructed to pick the most important thing from the plan and build it. All tests, lints etc must succeed before it can commit. It also logs to specname_log.md

This can loop for as long as both plan/builder iterations don't say that there's nothing to be done. And when that happens (usually after hours or days), reviewer agent steps in to do a more thorough review that everything listed in spec is done.

I also maintain a directives.md file in spec dir. I always review the TUI every 30 minutes or so to check if its getting stuck somewhere or taking path I don't like. I then put a single line in directives.md: "- X is the wrong approach and must be dropped"

All agents in their prompt have a line that says: "read directives.md - it contains human overrides that must be followed and they override everything".

This works extremely well. But the biggest downside is that you'll hit weekly Max 20x limits in 3 days.

I also advice ignoring any tooling that hides these things from you. All of this is a 300 line bash script and a few template prompt.md files that I built in a few minutes with opus, based on what I observed were the pain points. You need to be able to tweak your system in a few minutes down to the minute details. Using things like GSD, gastown, spec-kit, openclaw, etc locks you into the paradigms of people who also, don't know what we're all doing and what approach will win. In a few years, it's possible something will emerge that we all universally adopt, but right now, nobody has an idea what works.