I think the upper limit is your ability to decide what to build among infinite possibilities. How should it work, what should it be like to use it, what makes the most sense, etc.
The code part is trivial and a waste of time in some ways compared to time spent making decisions about what to build. And sometimes even a procrastination to avoid thinking about what to build, like how people who polish their game engine (easy) to avoid putting in the work to plan a fun game (hard).
The more clarity you have about what you’re building, then the larger blocks of work you can delegate / outsource.
So I think one overwhelming part of LLMs is that you don’t get the downtime of working on implementation since that’s now trivial; you are stuck doing the hard part of steering and planning. But that’s also a good thing.
I've found writing the code massively helps your understanding of the problem and what you actually need or want. Most times I go into a task with a certain idea of how it should work, and then reevaluate having started. While an LLM will just do what you ask without questing, leaving you with none of the learnings you would have gained having done it. The LLM certainly didn't learn or remember anything from it.
In some cases, yes. But I’ve been doing this awhile now and there is a lot of code that has to be written that I will not learn anything from. And now, I have a choice to not write it.
I doubt if we're talking about the same sort of things at all. I'm talking about stuff like generic web crud. Too custom to be generated deterministically but recent models crush it and make fewer errors than I do. But that is not even all they can do. But yes, once you get into a large complicated code base its not always worth it, but even there one benefit is it to develop more test cases - and more complicated ones - than I would realistically bother with.
The whole time I'm doing it, I'm trying to think of better ways. I'm thinking of libraries, utilities or even frameworks I could create to reduce the tedium.
This is actually one of the things I dislike the most about LLM coding: they have no problem with tedium and will happily generate tens of thousands of lines where a much better approach could exist.
I think it's an innovation killer. Would any of the ORMs or frameworks we have today exist if we'd had LLMs this whole time?
It depends on how you use them. In my workflow, I work with the LLM to get the desired result, and I'm familiar with the system architecture without writing any of the code.
I've written it up here, including the transcript of an actual real session:
I just woke up recently myself and found out these tools were actually becoming really, really good. I use a similar prompt system, but not as much focus on review - I've found the review bots to be really good already but it is more efficient to work locally.
One question I have since you mention using lots of different models - is do you ever have to tweak prompts for a specific model, or are these things pretty universal?
I don't tweak prompts, no. I find that there's not much need to, the models understand my instructions well enough. I think we're way past the prompt engineering days, all models are very good at following instructions nowadays.
Right when you're coding with LLM it's not you asking the LLM questions, it's LLM asking you questions, about what to build, how should it work exactly, should it do this or that under what conditions. Because the LLM does the coding, it's you have to do more thinking. :-)
And when you make the decisions it is you who is responsible for them. Whereas if you just do the coding the decisions about the code are left largely to you nobody much sees them, only how they affect the outcome. Whereas now the LLM is in that role, responsible only for what the code does not how it does it.
Hehe, speak for yourself- as a 1x coder on a good day, having a nonjudgmental partner who can explain stuff to me is one of the best parts of writing with an llm :)
I like that aspect of it too. LLM never seems to get offended even when I tell it its wrong. Just trying to understand why some people say it can feel exhausting. Instead of focusing on narrowly defined coding tasks, the work has changed and you are responsible for a much larger area of work, and expectations are similarly higher. You're supposed to produce 10x code now.
Not sure if it's what you're talking about but I had a coworker trying to break into eSports and he talked a lot about the micro vs macro skills a game requires. Sounds like we all have an aimbot for programming so the competition has shifted hard towards the macro. That could definitely be tiring.
This is such a weird statement. Game engines are among the most complicated pieces of software in existence. Furthermore, a game that doesn't run smoothly increases the chances that your player base doesn't stick around to see what you've built.
That probably pops up all over the place, like how there's no real progress making the terminal support different keyboards/languages (e.g. send raw key code to terminal apps).
Technical people already have to make concessions to deal with ascii chars and English in computing by the time they use a terminal, so the upside of changing any one thing kinda peters out.
That's probably not a good intuition to have for a display rendered from ANSI escape sequences. Maybe not even from text rendered from unicode.
Though a good terminal should let you control whether you want to render the anchor text, show you the underlying link when you focus/hover/click it, etc.
Also, when you hit compaction at 200k tokens, that was probably when things were just getting good. The plan was in its final stage. The context had the hard-fought nuances discovered in the final moment. Or the agent just discovered some tiny important details after a crazy 100k token deep dive or flailing death cycle.
Now you have to compact and you don’t know what will survive. And the built-in UI doesn’t give you good tools like deleting old messages to free up space.
I've found compactation kills the whole thing. Important debug steps completely missing and the AI loops back round thinking it's found a solution when we've already done that step.
I find it useful to make Claude track the debugging session with a markdown file. It’s like a persistent memory for a long session over many context windows.
Or make a subagent do the debugging and let the main agent orchestrate it over many subagent sessions.
It's an inevitable outcome of automatic code generation that people will do this all the time without thinking about it.
Example: you want a feature in your project, and you know this github repo implements it, so you tell an AI agent to implement the feature and link to the github repo just for reference.
You didn't tell the agent to maliciously reimplement it, but the end result might be the same - you just did it earnestly.
These sacrificial two-days-on-the-toilet offerings are like giving confessions to the priest to get back on the good side so you don't have to change your behavior.
Yes I can eat this 4200cal Costco pizza, I did my cleanse last month.
"Fiber from food" seems good enough. It's hard not to fibermax without incidentally improving your diet substantially. For example, beans are one of the best and easiest sources of it.
Splitting hairs beyond that, like insoluble and soluble, is the kind of thing that just confuses and intimidates people about nutrition advice.
It's a bridge you can cross once everyone is eating 50g+ of fiber per day, has chiseled physiques, and are looking to min/max.
Some people just can't take a compliment, especially if it's generated. (I'm one of them.) Still, /insight did give useful help, but I wasn't able to target it to specific repo/sessions.
reply