I find it increases my productivity about 5-10% when working with the technologies I'm the most familiar with and use regularly (Elixir, Phoenix, JavaScript, general web dev.) But when I'm doing something unfamiliar and new, it's more like 90%. It's incredible.
Recently at work, for example, I've been setting up a bunch of stuff with some new technologies and libraries that I'd never really used before. Without ChatGPT I'd have spent hours if not days poring through tedious documentation and outdated tutorials while trying to hack something together in an agonising process of trial and error. But ChatGPT gave me a fantastic proof-of-concept app that has everything I needed to get started. It's been enormously helpful and I'm convinced it saved me days of work. This technology is miraculous.
As for my job security... well, I think I'm safe for now; ChatGPT sped me up in this instance but the generated app still needs a skilled programmer to edit it, test it and deploy it.
On the other hand I am slightly concerned that ChatGPT will destroy my side income from selling programming courses... so if you're a Rails developer who wants to learn Elixir and Phoenix, please check out my course Phoenix on Rails before we're both replaced by robots: PhoenixOnRails.com
(Sorry for the self promotion but the code ELIXIRFORUM will give a $10 discount.)
The thing is the hallucinations, I also wasted few hours trying to work on solutions with GPT where it just kept making up parameters and random functions.
So much this. The thing hallucinates far more than the hyperventilation seems willing to acknowledge.
You really need to be quite competent in the thing you're asking it to do in order to ferret out the hallucinations, which greatly diminishes the potency of GPT in the hands of someone who has no knowledge of the relevant language/runtime/problem domain/etc.
Not if the hallucination introduces runtime errors that can't be identified a priori with any sort of static analysis or compilation/interpreting stage.
But no, you're fundamentally right. It just goes to the question of whether an LLM assistant can in any sense replace or displace human programmers, or save time for human programmers. The answer seems to be somewhat, and in certain cases, but not much else.
If I already know the technology I'm querying GPT about, I'm going to spend at least some time identifying its hallucinations or realising that it introduced some. I might have been better off just doing it myself. If I don't know the technology I'm querying GPT about, I'm going to be impacted by its hallucinations but will also have to spend time figuring out what the hallucinations are and why this unfamiliar code sample doesn't work.
I had my colleague had troubles getting an email from Google docs into listmonk.
She asked gpt to help get an html version since apparently she got stuck with the wysiwg editor.
However gpt gave back a full html structure, including head and body. Pasting that into listmonk breaks entire webpage. Then she freaked out and told me listmonk sucks :)
There's a lot of things which could be done to improve this:
1) It could use the JSONformer idea [0] where we have a model of the language which determines what are the valid next tokens; we only ask it to supply a token when the language model gives us a choice, and when considering possible next tokens, we immediately ignore any which are invalid given the model. This could go beyond mere syntax to actually considering the APIs/etc which exist, so if the LLM has already generated tokens "import java.util.", then it could only generate a completion which was a public class (or subpackage) of "java.util.". Maybe something like language servers could help here.
2) Every output it generates, automatically compile and test it before showing it to the user. If compile/test fails, give it a chance to fix its mistake. If it gets stuck in a loop, or isn't getting anywhere after several attempts, fall back to next most likely output, and repeat. If after a while we still aren't getting anywhere, it can show the user its attempts (in case they give the user any idea).
Integration with linters is going to be the next stage in generative coding.
It should suggest, lint the suggestion in the background, and if it passes offer the suggestion and if not provide the linting issues output to rework the suggestion.
In general, token costs going down will in turn increase the number of multi-pass generation systems over single-pass systems, which is going to improve dramatically.
Combine all that with persistent memory storages that can provide in-context additional guidance around better working with your codebase and you, and it's going to be quite a different experience than it is today.
And at the current rate of advancement, that's maybe going to be how things will look within a year or two.
You wouldn’t believe what you can get past a linter. You need test cases that cover the intention of the code, but I‘ve also seen well tested code behave totally counter to its purpose.
I’ve found it to be very forgetful and have to work function-by-function, giving it the current code as part of the next prompt. Otherwise it randomly changes class names, invents new bits that weren’t there before or forgets entire chunks of functionality.
It’s a good discipline as I have to work out exactly what I want to achieve first and then build it up piece by piece. A great way to learn a new framework or language.
It also sometimes picks convoluted ways of doing things, so regularly asking whether there’s a simpler way of doing things can be useful.
IIRC its "memory" (actually input size, it remembers by taking its previous output as input) is only about 500 tokens, and that has to contain both your prompt and the beginning of the answer to hold relevance towards the end of its answer. So yes, it can't make anything bigger than maybe a function or two with any consistency. Writing a whole program is just not possible for an LLM without some other knowledge store for it to cross reference, and even then I have my doubts.
GPT3.5 is 4k tokens and has a 16k version
GP4 is 8k and has a 32k version.
You are correct that this needs to account for both input and output. I suspect that when you feed chat gpt longer it prompts, it may try to use the 16k / 32k models when it makes sense.
This is my experience too. Paying $20/month for GPT-4 has been absolutely worth it. It barely hallucinates at all; the results aren't always perfect (and the September 2021 knowledge cut-off can be frustrating given how quickly things get out of date in the programming world) but it's more than good enough. I don't remember how I ever got by without it.
This is what ChatGPT and GPT4 are good for, iterating quickly in an unfamiliar ecosystem. Picking up frameworks now feels like a ChatGPT superpower. It doesn't remove reasoning and I've seen some scary bugs introduced if you're not really carefully monitoring what the AI is outputting.
Basically, these days before I dig into documentation I ask "How do I do X with Y framework in Language Z" and if it's pre-2021 tech it works amazingly well.
Especially when you know something similar. Like porting between front-end frameworks. Just sketch out some React code and ask it to port to Vue - you can even tell it to explain the Vue code line-by-line and ask follow up questions, ex "Oh, so $FEATURE is like hooks in React?" "Yes, but ..."
Funnily enough I find the opposite, its most effective for me when using something familiar (though nowhere near 90%). If I'm familiar with it, I can figure out pretty quickly whats a hallucination and whats not, and to what extent it is (sometimes its just a few values that need changing, sometimes its completely wrong with almost no basis in reality). The time I spend attempting to fix its output in unfamiliar territory makes it more of a pain than its worth for me
Recently at work, for example, I've been setting up a bunch of stuff with some new technologies and libraries that I'd never really used before. Without ChatGPT I'd have spent hours if not days poring through tedious documentation and outdated tutorials while trying to hack something together in an agonising process of trial and error. But ChatGPT gave me a fantastic proof-of-concept app that has everything I needed to get started. It's been enormously helpful and I'm convinced it saved me days of work. This technology is miraculous.
As for my job security... well, I think I'm safe for now; ChatGPT sped me up in this instance but the generated app still needs a skilled programmer to edit it, test it and deploy it.
On the other hand I am slightly concerned that ChatGPT will destroy my side income from selling programming courses... so if you're a Rails developer who wants to learn Elixir and Phoenix, please check out my course Phoenix on Rails before we're both replaced by robots: PhoenixOnRails.com
(Sorry for the self promotion but the code ELIXIRFORUM will give a $10 discount.)