Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C. On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.

Can you talk more about the structure of your workflow and how you evolved it to be that?





I've tried most of the agentic "let it rip" tools. Quickly I realized that GPT 5~ was significantly better at reasoning and more exhaustive than Claude Code (Opus, RL finetuned for Claude Code).

"What if Opus wrote the code, and GPT 5~ reviewed it?" I started evaluating this question, and started to get higher quality results and better control of complexity.

I could also trust this process to a greater degree than my previous process of trying to drive Opus, look at the code myself, try and drive Opus again, etc. Codex was catching bugs I would not catch with the same amount of time, including bugs in hard math, etc -- so I started having a great degree of trust in its reasoning capabilities.

I've codified this workflow into a plugin which I've started developing recently: https://github.com/evil-mind-evil-sword/idle

It's a Claude Code plugin -- it combines the "don't let Claude stop until condition" (Stop hook) with a few CLI tools to induce (what the article calls) review gates: Claude will work indefinitely until the reviewer is satisfied.

In this case, the reviewer is a fresh Opus subagent which can invoke and discuss with Codex and Gemini.

One perspective I have which relates to this article is that the thing one wants to optimize for is minimizing the error per unit of work. If you have a dynamic programming style orchestration pattern for agents, you want the thing that solves the small unit of work (a task) to have as low error as possible, or else I suspect the error compounds quickly with these stochastic systems.

I'm trying this stuff for fairly advanced work (in a PhD), so I'm dogfooding ideas (like the ones presented in this article) in complex settings. I think there is still a lot of room to learn here.


I'm sure we're just working with the same tools thinking through the same ideas. Just curious if you've seen my newsletter/channel @enterprisevibecode https://www.enterprisevibecode.com/p/let-it-rip

It's cool to see others thinking the same thing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: