I find it increases my productivity about 5-10% when working with the technologi...

wouldbecouldbe · on July 4, 2023

The thing is the hallucinations, I also wasted few hours trying to work on solutions with GPT where it just kept making up parameters and random functions.

abalashov · on July 5, 2023

So much this. The thing hallucinates far more than the hyperventilation seems willing to acknowledge.

You really need to be quite competent in the thing you're asking it to do in order to ferret out the hallucinations, which greatly diminishes the potency of GPT in the hands of someone who has no knowledge of the relevant language/runtime/problem domain/etc.

fomine3 · on July 5, 2023

Hallucination is less a problem for programming compared to other use case, because finally program must be run.

abalashov · on July 5, 2023

Not if the hallucination introduces runtime errors that can't be identified a priori with any sort of static analysis or compilation/interpreting stage.

But no, you're fundamentally right. It just goes to the question of whether an LLM assistant can in any sense replace or displace human programmers, or save time for human programmers. The answer seems to be somewhat, and in certain cases, but not much else.

If I already know the technology I'm querying GPT about, I'm going to spend at least some time identifying its hallucinations or realising that it introduced some. I might have been better off just doing it myself. If I don't know the technology I'm querying GPT about, I'm going to be impacted by its hallucinations but will also have to spend time figuring out what the hallucinations are and why this unfamiliar code sample doesn't work.

wouldbecouldbe · on July 5, 2023

I had my colleague had troubles getting an email from Google docs into listmonk.

She asked gpt to help get an html version since apparently she got stuck with the wysiwg editor.

However gpt gave back a full html structure, including head and body. Pasting that into listmonk breaks entire webpage. Then she freaked out and told me listmonk sucks :)

wouldbecouldbe · on July 5, 2023

It's a huge problem on many levels, but in this case it so much more time intensive. Diminishing it's use.

arrowsmith · on July 5, 2023

Try paying for GPT-4 - it barely hallucinates at all, at least as far as I've noticed.

wouldbecouldbe · on July 5, 2023

I use GPT-4 it for sure does if you do a bit different things off the beaten path

tehwebguy · on July 4, 2023

It referenced a made up a function I needed (that should probably exist lol) in BrightScript, the letdown after realizing as much was painful.

pydry · on July 4, 2023

It did the same thing to me with docker compose the other day.

For features that probably should exist but don't it does a really good job of sending you on a wild goose chase.

cjohnson318 · on July 5, 2023

This has happened to me with Django/DRF twice. I've just accepted that it's more efficient to read and internalize the documentation.

tokamak-teapot · on July 5, 2023

Did you try asking it to write the function?

jes5199 · on July 5, 2023

we should probably implement a lot of the hallucinated methods - consider them the obvious missing pieces of our APIs

davewritescode · on July 5, 2023

This is the WORST feeling if you use co-pilot in an IDE. It's so incredibly disappointing.

skissane · on July 5, 2023

There's a lot of things which could be done to improve this:

1) It could use the JSONformer idea [0] where we have a model of the language which determines what are the valid next tokens; we only ask it to supply a token when the language model gives us a choice, and when considering possible next tokens, we immediately ignore any which are invalid given the model. This could go beyond mere syntax to actually considering the APIs/etc which exist, so if the LLM has already generated tokens "import java.util.", then it could only generate a completion which was a public class (or subpackage) of "java.util.". Maybe something like language servers could help here.

2) Every output it generates, automatically compile and test it before showing it to the user. If compile/test fails, give it a chance to fix its mistake. If it gets stuck in a loop, or isn't getting anywhere after several attempts, fall back to next most likely output, and repeat. If after a while we still aren't getting anywhere, it can show the user its attempts (in case they give the user any idea).

[0] https://github.com/1rgs/jsonformer

kromem · on July 5, 2023

Integration with linters is going to be the next stage in generative coding.

It should suggest, lint the suggestion in the background, and if it passes offer the suggestion and if not provide the linting issues output to rework the suggestion.

In general, token costs going down will in turn increase the number of multi-pass generation systems over single-pass systems, which is going to improve dramatically.

Combine all that with persistent memory storages that can provide in-context additional guidance around better working with your codebase and you, and it's going to be quite a different experience than it is today.

And at the current rate of advancement, that's maybe going to be how things will look within a year or two.

IanCal · on July 5, 2023

> It should suggest, lint the suggestion in the background

This makes a big difference, I'm making code writing stuff at the moment.

Injecting results from a language server while it's generating would be huge imo - same as giving humans autocomplete & hints.

manmal · on July 5, 2023

You wouldn’t believe what you can get past a linter. You need test cases that cover the intention of the code, but I‘ve also seen well tested code behave totally counter to its purpose.

yieldcrv · on July 4, 2023

Yeah it can give you something that works out of the box, but fixing it requires even more effort

Better to ask it for a bunch of small things and piece them together

mattkevan · on July 4, 2023

Yes. Start small and build up.

I’ve found it to be very forgetful and have to work function-by-function, giving it the current code as part of the next prompt. Otherwise it randomly changes class names, invents new bits that weren’t there before or forgets entire chunks of functionality.

It’s a good discipline as I have to work out exactly what I want to achieve first and then build it up piece by piece. A great way to learn a new framework or language.

It also sometimes picks convoluted ways of doing things, so regularly asking whether there’s a simpler way of doing things can be useful.

Turskarama · on July 4, 2023

IIRC its "memory" (actually input size, it remembers by taking its previous output as input) is only about 500 tokens, and that has to contain both your prompt and the beginning of the answer to hold relevance towards the end of its answer. So yes, it can't make anything bigger than maybe a function or two with any consistency. Writing a whole program is just not possible for an LLM without some other knowledge store for it to cross reference, and even then I have my doubts.

rubyskills · on July 5, 2023

This isn't quite accurate.

GPT3.5 is 4k tokens and has a 16k version GP4 is 8k and has a 32k version.

You are correct that this needs to account for both input and output. I suspect that when you feed chat gpt longer it prompts, it may try to use the 16k / 32k models when it makes sense.

AussieWog93 · on July 4, 2023

Were you using GPT-3.5 or GPT-4?

GPT-4 reduces hallucinations by at least an order of magnitude, and hasn't failed me yet.

arrowsmith · on July 5, 2023

This is my experience too. Paying $20/month for GPT-4 has been absolutely worth it. It barely hallucinates at all; the results aren't always perfect (and the September 2021 knowledge cut-off can be frustrating given how quickly things get out of date in the programming world) but it's more than good enough. I don't remember how I ever got by without it.

supriyo-biswas · on July 5, 2023

You could save some money by using GPT-4’s API and a self hosted frontend like YakGPT.

addisonl · on July 5, 2023

you can also just use OpenAI’s playground.

teaearlgraycold · on July 5, 2023

What's the problem with hallucinations when your editor can tell you automatically if the code compiles or not?

miki_tyler · on July 5, 2023

Sometimes the hallucinations compile.

jaggederest · on July 5, 2023

It's nice that we've taught the robots to make off-by-one errors just like a real developer.

selcuka · on July 5, 2023

> Sometimes the hallucinations compile.

In that case they become complications.

afpx · on July 5, 2023

Have it write the unit tests first.

teaearlgraycold · on July 5, 2023

Fuck it, ship it!

davewritescode · on July 5, 2023

This is what ChatGPT and GPT4 are good for, iterating quickly in an unfamiliar ecosystem. Picking up frameworks now feels like a ChatGPT superpower. It doesn't remove reasoning and I've seen some scary bugs introduced if you're not really carefully monitoring what the AI is outputting.

Basically, these days before I dig into documentation I ask "How do I do X with Y framework in Language Z" and if it's pre-2021 tech it works amazingly well.

teaearlgraycold · on July 5, 2023

Especially when you know something similar. Like porting between front-end frameworks. Just sketch out some React code and ask it to port to Vue - you can even tell it to explain the Vue code line-by-line and ask follow up questions, ex "Oh, so $FEATURE is like hooks in React?" "Yes, but ..."

bodge5000 · on July 5, 2023

Funnily enough I find the opposite, its most effective for me when using something familiar (though nowhere near 90%). If I'm familiar with it, I can figure out pretty quickly whats a hallucination and whats not, and to what extent it is (sometimes its just a few values that need changing, sometimes its completely wrong with almost no basis in reality). The time I spend attempting to fix its output in unfamiliar territory makes it more of a pain than its worth for me