Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find it increases my productivity about 5-10% when working with the technologies I'm the most familiar with and use regularly (Elixir, Phoenix, JavaScript, general web dev.) But when I'm doing something unfamiliar and new, it's more like 90%. It's incredible.

Recently at work, for example, I've been setting up a bunch of stuff with some new technologies and libraries that I'd never really used before. Without ChatGPT I'd have spent hours if not days poring through tedious documentation and outdated tutorials while trying to hack something together in an agonising process of trial and error. But ChatGPT gave me a fantastic proof-of-concept app that has everything I needed to get started. It's been enormously helpful and I'm convinced it saved me days of work. This technology is miraculous.

As for my job security... well, I think I'm safe for now; ChatGPT sped me up in this instance but the generated app still needs a skilled programmer to edit it, test it and deploy it.

On the other hand I am slightly concerned that ChatGPT will destroy my side income from selling programming courses... so if you're a Rails developer who wants to learn Elixir and Phoenix, please check out my course Phoenix on Rails before we're both replaced by robots: PhoenixOnRails.com

(Sorry for the self promotion but the code ELIXIRFORUM will give a $10 discount.)



The thing is the hallucinations, I also wasted few hours trying to work on solutions with GPT where it just kept making up parameters and random functions.


So much this. The thing hallucinates far more than the hyperventilation seems willing to acknowledge.

You really need to be quite competent in the thing you're asking it to do in order to ferret out the hallucinations, which greatly diminishes the potency of GPT in the hands of someone who has no knowledge of the relevant language/runtime/problem domain/etc.


Hallucination is less a problem for programming compared to other use case, because finally program must be run.


Not if the hallucination introduces runtime errors that can't be identified a priori with any sort of static analysis or compilation/interpreting stage.

But no, you're fundamentally right. It just goes to the question of whether an LLM assistant can in any sense replace or displace human programmers, or save time for human programmers. The answer seems to be somewhat, and in certain cases, but not much else.

If I already know the technology I'm querying GPT about, I'm going to spend at least some time identifying its hallucinations or realising that it introduced some. I might have been better off just doing it myself. If I don't know the technology I'm querying GPT about, I'm going to be impacted by its hallucinations but will also have to spend time figuring out what the hallucinations are and why this unfamiliar code sample doesn't work.


I had my colleague had troubles getting an email from Google docs into listmonk.

She asked gpt to help get an html version since apparently she got stuck with the wysiwg editor.

However gpt gave back a full html structure, including head and body. Pasting that into listmonk breaks entire webpage. Then she freaked out and told me listmonk sucks :)


It's a huge problem on many levels, but in this case it so much more time intensive. Diminishing it's use.


Try paying for GPT-4 - it barely hallucinates at all, at least as far as I've noticed.


I use GPT-4 it for sure does if you do a bit different things off the beaten path


It referenced a made up a function I needed (that should probably exist lol) in BrightScript, the letdown after realizing as much was painful.


It did the same thing to me with docker compose the other day.

For features that probably should exist but don't it does a really good job of sending you on a wild goose chase.


This has happened to me with Django/DRF twice. I've just accepted that it's more efficient to read and internalize the documentation.


Did you try asking it to write the function?


we should probably implement a lot of the hallucinated methods - consider them the obvious missing pieces of our APIs


This is the WORST feeling if you use co-pilot in an IDE. It's so incredibly disappointing.


There's a lot of things which could be done to improve this:

1) It could use the JSONformer idea [0] where we have a model of the language which determines what are the valid next tokens; we only ask it to supply a token when the language model gives us a choice, and when considering possible next tokens, we immediately ignore any which are invalid given the model. This could go beyond mere syntax to actually considering the APIs/etc which exist, so if the LLM has already generated tokens "import java.util.", then it could only generate a completion which was a public class (or subpackage) of "java.util.". Maybe something like language servers could help here.

2) Every output it generates, automatically compile and test it before showing it to the user. If compile/test fails, give it a chance to fix its mistake. If it gets stuck in a loop, or isn't getting anywhere after several attempts, fall back to next most likely output, and repeat. If after a while we still aren't getting anywhere, it can show the user its attempts (in case they give the user any idea).

[0] https://github.com/1rgs/jsonformer


Integration with linters is going to be the next stage in generative coding.

It should suggest, lint the suggestion in the background, and if it passes offer the suggestion and if not provide the linting issues output to rework the suggestion.

In general, token costs going down will in turn increase the number of multi-pass generation systems over single-pass systems, which is going to improve dramatically.

Combine all that with persistent memory storages that can provide in-context additional guidance around better working with your codebase and you, and it's going to be quite a different experience than it is today.

And at the current rate of advancement, that's maybe going to be how things will look within a year or two.


> It should suggest, lint the suggestion in the background

This makes a big difference, I'm making code writing stuff at the moment.

Injecting results from a language server while it's generating would be huge imo - same as giving humans autocomplete & hints.


You wouldn’t believe what you can get past a linter. You need test cases that cover the intention of the code, but I‘ve also seen well tested code behave totally counter to its purpose.


Yeah it can give you something that works out of the box, but fixing it requires even more effort

Better to ask it for a bunch of small things and piece them together


Yes. Start small and build up.

I’ve found it to be very forgetful and have to work function-by-function, giving it the current code as part of the next prompt. Otherwise it randomly changes class names, invents new bits that weren’t there before or forgets entire chunks of functionality.

It’s a good discipline as I have to work out exactly what I want to achieve first and then build it up piece by piece. A great way to learn a new framework or language.

It also sometimes picks convoluted ways of doing things, so regularly asking whether there’s a simpler way of doing things can be useful.


IIRC its "memory" (actually input size, it remembers by taking its previous output as input) is only about 500 tokens, and that has to contain both your prompt and the beginning of the answer to hold relevance towards the end of its answer. So yes, it can't make anything bigger than maybe a function or two with any consistency. Writing a whole program is just not possible for an LLM without some other knowledge store for it to cross reference, and even then I have my doubts.


This isn't quite accurate.

GPT3.5 is 4k tokens and has a 16k version GP4 is 8k and has a 32k version.

You are correct that this needs to account for both input and output. I suspect that when you feed chat gpt longer it prompts, it may try to use the 16k / 32k models when it makes sense.


Were you using GPT-3.5 or GPT-4?

GPT-4 reduces hallucinations by at least an order of magnitude, and hasn't failed me yet.


This is my experience too. Paying $20/month for GPT-4 has been absolutely worth it. It barely hallucinates at all; the results aren't always perfect (and the September 2021 knowledge cut-off can be frustrating given how quickly things get out of date in the programming world) but it's more than good enough. I don't remember how I ever got by without it.


You could save some money by using GPT-4’s API and a self hosted frontend like YakGPT.


you can also just use OpenAI’s playground.


What's the problem with hallucinations when your editor can tell you automatically if the code compiles or not?


Sometimes the hallucinations compile.


It's nice that we've taught the robots to make off-by-one errors just like a real developer.


> Sometimes the hallucinations compile.

In that case they become complications.


Have it write the unit tests first.


Fuck it, ship it!


This is what ChatGPT and GPT4 are good for, iterating quickly in an unfamiliar ecosystem. Picking up frameworks now feels like a ChatGPT superpower. It doesn't remove reasoning and I've seen some scary bugs introduced if you're not really carefully monitoring what the AI is outputting.

Basically, these days before I dig into documentation I ask "How do I do X with Y framework in Language Z" and if it's pre-2021 tech it works amazingly well.


Especially when you know something similar. Like porting between front-end frameworks. Just sketch out some React code and ask it to port to Vue - you can even tell it to explain the Vue code line-by-line and ask follow up questions, ex "Oh, so $FEATURE is like hooks in React?" "Yes, but ..."


Funnily enough I find the opposite, its most effective for me when using something familiar (though nowhere near 90%). If I'm familiar with it, I can figure out pretty quickly whats a hallucination and whats not, and to what extent it is (sometimes its just a few values that need changing, sometimes its completely wrong with almost no basis in reality). The time I spend attempting to fix its output in unfamiliar territory makes it more of a pain than its worth for me




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: