What if the hammer has a problem where the handle breaks off randomly? Same thing happens with AI. Sometimes it breaks, randomly, and without any way of predicting it.
Another way to look at it is that the operator of the hammer has an immediate feedback loop and will not continue with a broken hammer. AI as it stands rarely has that feedback on the consequences of its decisions, and lacks the ability to react appropriately.
lots and lots of companies are making that distinction. but try and write a post here saying “our productivity is through the roof and our systems have never been more stable since we started using AI” and see what happens. as it always goes in this day and age, bubbles and echochambers… so easier to just go about your day doing amazing shit at amazing pace than try to “argue” about the merits of a technology. every post I see here suggesting positive results get dowvoted faster than anything else
AI is an umbrella term. All AI models can hallucinate. There has been no solution to this problem. Until that problem is resolved, it is, in my opinion, something that only an idiot would run in production. I read about a company that had their whole codebase wiped out because they gave an agent access to be able to do that.
No, it's not. The problem is all AI hallucinates. Therefore, it is guaranteed to be confidently wrong. Until the problem of hallucinations are solved, anyone using AI in a production environment is an idiot, which is, of course, my personal opinion. But it seems pretty cut and dry to me.
Your original post (and even after this comment I think) was vague in that AI can be used in a lot of different ways in 'production' - to generate code, to manage deployment / scripts, or as part of a feature that uses inference.
For example, if you're writing code with AI, you can still review it just like you would if a colleague wrote it. You can write tests (or have the AI do so) to prevent some hallucinations, too.
Yes, AIs that hallucinate can all be used in different ways. But they can still all hallucinate, so I fail to see how what you are saying mitigates the fundamental, as yet to be solved, problem of AI hallucinations.
edit to say, what is the point, after all, of artificial intelligence if it's not used to make decisions? That's what it does. But ALL AI HALLUCINATES. Therefore, it's unreliable.
Tons of people, apparently, aren't enough. I guess I'm just tired of seeing post after post on HN about people complaining that their use of AI in production isn't reliable.
It makes me want to pull out the hair I used to have an scream into the wilderness and eat a twinkie.
Why do we find the unreliabilty and resulting hallicinations as acceptable for AI in production? Can you imagine if Postgres, Apache, Nginx, hell even the Linux kernel were allowed to be use in production if they occassionally went insane?
the duct tape framing is fair but the deeper issue is the model has no persistent understanding of the system it's working in. each generation starts from scratch with no memory of prior context or architectural decisions. that's a harder problem than prompt engineering but it's solvable at the infrastructure layer
I've built a list of common gotchas in the generation prompts.
Also if the compilation fails it falls back to opus with the error message and code and can try again twice.
This is built with the paradigm of "don't build for the model of today, build for the model in 6 months" It currently works, which amazes me still, but it will get much better!
reply