> This, to me, is the critical and fatal flaw that prevents me from using or even being excited about LLMs: That they can be randomly, nondeterministically and confidently wrong, and there is no way to know without manually reviewing every output.
Sounds a lot like most engineers I’ve ever worked with.
There are a lot of people utilizing LLMs wisely because they know and embrace this. Reviewing and understanding their output has always been the game. The whole “vibe coding” trend where you send the LLM off to do something and hope for the best will teach anyone this lesson very quickly if they try it.
LLMs seem to care about getting things right and improve much faster than engineers. They've gone from non-verbal to reasonable coders in ~5 years, it takes humans a good 15 to do the same.
The people training the LLMs redid the training and fine tuned the networks and put out new LLMs. Even if marketing misleadingly uses human related terms to make you believe they evolve.
A LLM from 5 years ago will be as bad as 5 years ago.
Conceivably a LLM that can retrain itself on the input that you give it locally could indeed improve somewhat, but even if you could afford the hardware, do you see anyone giving you that option?
Are you sure this is the general understanding? There's a lot of antropomorphic language thrown around when talking about LLMs. It wouldn't surprise me that people believe chatgpt 5.5 is chatgpt 1.0 that has "evolved".
You cannot really compare the two. An engineer will continue to learn and adapt their output to the teams and organizations they interact with. They will be seamlessly picking up core principles, architectural nouances and verbiage of the specific environment. You need to explicitly pass all that to an llm and all approaches today lack.
Most importantly, an engineer will continue accumulating knowledge and skills while you interact with them. An llm won't.
With ChatGPT explicitly storing "memory" about the user and access to the history of all chats, that can also change. Not hard to imagine an AI-powered IDE like Cursor understanding that when you reran a prompt or gave it an error message it came to understand that its original result was wrong in some way and that it needs to "learn" to improve its outputs.
Maybe. I'd wager the next couple of generations of inference architecture will still have issues with context on that strategy. Trying to work with the state of the art models at their context boundaries quickly descends into gray goop like behavior for now and I don't see anything on the horizon that changes that rn.
Sounds a lot like most engineers I’ve ever worked with.
There are a lot of people utilizing LLMs wisely because they know and embrace this. Reviewing and understanding their output has always been the game. The whole “vibe coding” trend where you send the LLM off to do something and hope for the best will teach anyone this lesson very quickly if they try it.