Note: At the point of writing this, the comments are largely skeptical.
Reading this as an avid Codex CLI user, some things make sense and reflect lessons learned along the way. However, the patterns also get stale fast as agents improve and may be counterproductive. One such pattern is context anxiety, which probably reflects a particular model more than a general problem, and is likely an issue that will go away over time.
There are certainly patterns that need to be learned, and relearned over time. Learning the patterns is sort of an anti-pattern, since it is the model that should be trained to alleviate its shortcomings rather than the human. Then again, a successful mindset over the last three years has been to treat models as another form of intelligence, not as human intelligence, by getting to know them and being mindful of their strengths and weaknesses. This is quite a demanding task in terms of communication, reflection, and perspective-taking, and it is understandable that this knowledge is being documented.
But models change over time. The strengths and weaknesses of yesterday’s models are not the same as today’s, and reasoning models have actually removed some capabilities. A simple example is giving a reasoning model with tools the task of inspecting logs. It will most likely grep and parse out smaller sections, and may also refuse an instruction to load the file into context to inspect it. The model then relies on its reasoning (system 2) rather than its intuitive (system 1) thinking.
This means that many of these patterns are temporary, and optimizing for them risks locking human behavior to quirks that may disappear or even reverse as models evolve. YMMV.
I have the theory that agents will improve a lot when trained on more recent training data. Like I‘ve had agents have context anxiety because they still think an average LLM context window is around 32k tokens. Also building agents with agents, letting them do prompt engineering etc, still is very unsatisfactory, they keep talking about GPT-3.5 or Gemini 1.5 and try to optimize the prompts for those old models, which of course was almost a totally different thing. So I‘m thinking if that‘s how they are thinking of themselves as well, maybe that artificially limits their agentic behavior too, because they just don’t know how much more capable they are than GPT-3.5
Because “strengths” of a model is based not on inherit characteristics, but on various user perception. It feels that model A is doing some thing better, same at it feels that your productive is high.
Strong point. I’m considering to tag patterns better and add stuff like “model/toolchain-specific,” and something like “last validated (month/year)” field. Things change fast and for example “Context anxiety” is likely less relevant and should be reframed that way (or retired).
Reading this as an avid Codex CLI user, some things make sense and reflect lessons learned along the way. However, the patterns also get stale fast as agents improve and may be counterproductive. One such pattern is context anxiety, which probably reflects a particular model more than a general problem, and is likely an issue that will go away over time.
There are certainly patterns that need to be learned, and relearned over time. Learning the patterns is sort of an anti-pattern, since it is the model that should be trained to alleviate its shortcomings rather than the human. Then again, a successful mindset over the last three years has been to treat models as another form of intelligence, not as human intelligence, by getting to know them and being mindful of their strengths and weaknesses. This is quite a demanding task in terms of communication, reflection, and perspective-taking, and it is understandable that this knowledge is being documented.
But models change over time. The strengths and weaknesses of yesterday’s models are not the same as today’s, and reasoning models have actually removed some capabilities. A simple example is giving a reasoning model with tools the task of inspecting logs. It will most likely grep and parse out smaller sections, and may also refuse an instruction to load the file into context to inspect it. The model then relies on its reasoning (system 2) rather than its intuitive (system 1) thinking.
This means that many of these patterns are temporary, and optimizing for them risks locking human behavior to quirks that may disappear or even reverse as models evolve. YMMV.