What we know about LLMs

killernap · on July 25, 2023

ChatGPT was announced November, 2022 - 8 months ago. Time flies.

Question for HN: Where are we in the hype cycle on this?

We can run shitty clones slowly on Raspberry Pi's and your phone. The educational implementations demonstrate the basics in under a thousand lines of brisk C. Great. At some point you have to wonder... well, so what?

Not one killer app has emerged. I for one am eager to be all hip and open minded and pretend like I use LLMs all the time for everything and they are "the future" but novelty aside it seems like so far we have a demented clippy and some sophomoric arguments about alignment and wrong think.

It did generate a whole lot of breathless click-bait-y articles and gave people something to blab about. Ironically it also accelerated the value of that sort of gab and clicks towards zero.

As I am not a VC, politician, or opportunist, hand waving and telling me this is Frankenstein's monster about to come alive and therefore I need billions of dollars or "regulations" just makes folks sound like the crypto scammers.

Please HN, say something actually insightful, I beg you.

iainctduncan · on July 25, 2023

I work in tech diligence so I look at companies in detail. I have seen a couple where good machine learning is going to make a massive difference (whether it will keep them ahead of everyone is a separate question). I think it really boils down to:

"Is this a problem where an answer that is mostly right and sometimes wrong is still a great value proposition?"

This is what people don't get. If sometimes the answer is (catastrophically) wrong, and the cost of this is high, there's no market fit. So I think a lot of these early LLM related startups are going to be trainwrecks because they haven't figured this out. If the cost of an error is very high in your business, and human checking is what you are trying to avoid, these are not nearly as helpful.

I looked at one company in this scenario and they were dying. Couldn't get big customers to commit because the product was just not worth it if it couldn't be reliably right on something that a human was never going to get wrong (can't say what it was, NDAs and all that.) I also looked at one where they were doing very well because an answer that was usually close would save workers tons of time, and the nature of the biz was that eliminating the human verification step would make no sense anyway. Let's just say it was in a very onerous search problem, and it was trivial for the searcher to say "wrong wrong wrong, RIGHT, phew that saved me hours!". And that saving was going to add up to very significant cash.

So killer apps are going to be out there. But I agree that there is massive overhype and it's not all of them! (or even many!)

myownpetard · on July 25, 2023

Interesting final point. It's like the business equivalent of NP-Complete problems, difficult to compute but easy to verify.

Can you give any examples of those types of problems you've encountered?

killernap · on July 25, 2023

That's interesting. Quite the needle to thread. I wonder how big the market will be for niche models that aren't commodities.

It needs to be something lucrative enough that training the model is not-trivial but not so lucrative Microsoft/Google would care enough to go after. And it somehow needs to stay in that sweet spot even as Nvidia chips away at that moat with each new hardware generation.

bastawhiz · on July 25, 2023

> Not one killer app has emerged.

I'll say that I pretty firmly disagree with this. I've been using Github Copilot for about six months for my own work and it has fundamentally changed how I write code. Ignoring the ethics of Copilot, if I just need to read a file with some data, parse it, and render that data on screen, Copilot just _does_ most of that for me. I write a chunky comment explaining what I want, it writes a blob of code that I tab through, and I'm left with a nicely-documented, functioning piece of software. A one-off script that took me 30 minutes to write previously now takes me maybe a minute on a bad day.

For ages we've had Text Expander and key mappings and shortcuts and macros that render templates of pre-built code. Now I can just say what I'm trying to do, the language model considers the other code on the page, and it gets done.

If this isn't a "killer app" then I'm not sure what is. In my entire career I can think of maybe two things that I've come upon that have affected my workflow this much: source control and continuous integration. Which, frankly, is wild.

Separately, I use LLMs to generate marketing copy for my side hustle. I suck at marketing, but I can tell the damn thing what I want to market and it gives me a list of tweets back that sound like the extroverted CMO that I don't have. I can outsource creative tasks like brainstorming lists of names for products, or coming up with text categories for user feedback from a spreadsheet. I don't know if I'd call either of those things "killer apps" but I have a tool which can do thinking for me at a nominal cost, quickly, and with a high-enough quality bar that it's usually not a waste of my time.

apsurd · on July 25, 2023

My friend made a great comparison that seems to agree with your take: chatGPT for coding is like when ruby on rails came out. Or wordpress. felt magical and boosted (a certain kind of) productivity through the roof.

We don't think of rails as the second coming though.

same with code editors. of course a rails for all of code is cool. but iono, it's a code editor. i still use sublime.

bastawhiz · on July 25, 2023

I'd maybe make the analogy that it's like the first ORM. Sure, you could write your own DB queries, but it just does what you want, and it's usually right.

Were ORMs the second coming? Meh. But it's arguable that they're still immensely powerful and useful and the way people write apps that interface with an RDBMS is permanently changed forevermore.

ravetcofx · on July 26, 2023

How did WordPress boost productivity? Fussing with hosting, CMS, plug-ins is a mess. I just went back to good old hand written HTML with pico.css. Got my site down from 8mb to 100kb

jasongill · on July 26, 2023

Most people cannot write good old hand written HTML; when WordPress came out and picked up stream, it was the biggest thing to hit the web hosting industry since FrontPage.

Arkhaine_kupo · on July 25, 2023

> Not one killer app has emerged.

I think the microsoft gpt integration on Office is probably that app.

Ability to ask to have your email's summarised, or getting your excel sheets formulas configured with natural language, etc are increidbly useful tools to lower the floor of entry to tools that already speed up humans so much.

I don't think the use of this tools is some life redefining feature, but a friend of mine joked that in a year from now you will right a simple sentence like "write polite work email with following request: Come to the meeting, you are late" then Gpt will write the email, another gpt will send it and his GPT will sumarise it and he will reply with another gpt message instantly apologising that you will read the summary off. Leaving a trail of polite long messages that no one will even open.

dleeftink · on July 25, 2023

Got a good chuckle from me. I find that in quick daily back-and-forths, time saved by such a system would be negligible. In many places I've worked, the 'polite work mail' has gone out the door long ago, already at the lower bound of what is considered a proper sentence.

scottmcf · on July 25, 2023

I agree, if it delivers on the kind of demos they showed off here:

https://news.microsoft.com/reinventing-productivity/

It's going to be an absolute "killer app".

killernap · on July 25, 2023

https://en.wikipedia.org/wiki/BonziBuddy

My god. If we hit that bullseye, the rest of the dominoes will fall like a house of cards. Checkmate.

acoustics · on July 25, 2023

It’s true that sometimes people repeat mistakes of the past by iterating on a fundamentally bad idea.

But sometimes the idea wasn’t bad. The mistake of the past could have been in execution of the idea or tech limitations.

When any new VR product is released, I could post a link to the article for the Nintendo Virtual Boy and make a snarky remark about how successful that was. That doesn’t really add anything though.

moffkalast · on July 25, 2023

She's built like a clippy, but she handles like a HAL.

wes-k · on July 26, 2023

The language model is willing but the weights are corrupted and flawed.

blharr · on July 25, 2023

Relevant comic https://marketoonist.com/2023/03/ai-written-ai-read.html

EEMac · on July 25, 2023

There was a science fiction story about this, with phone auto-message and auto-answer systems connecting with each other long after all the humans were dead.

dTal · on July 26, 2023

Can you remember the story? It sounds thematically similar to "There Will Come Soft Rains", but the details don't match.

https://en.wikipedia.org/wiki/There_Will_Come_Soft_Rains_(sh...

IKantRead · on July 25, 2023

> Where are we in the hype cycle on this?

Can we stop acting like the Gartner "hype cycle" is anything more than a marketing gimmick created Gartner to validate their own consulting/research services?

While you can absolutely find cases that map to the "hype cycle", there is nothing whatsoever to validate this model as remotely accurate or valid for describing technology trends.

Where is crypto in the "hype cycle"? It went through at least 3 rounds of "peak of inflated expectation" and I'm not confident it will ever reach a meaningful "plateau of productivity".

Did mobile ever have "inflated expectation"? Yes there was a lot of hype in the early days but those people hyped about it, rushing to build mobile versions of their websites... were correct.

The "hype cycle" is a neat idea but doesn't really map to reality in a way that makes it useful. It's only useful for Gartner to create an illusion of credibility and sell their services.

KronisLV · on July 25, 2023

> The "hype cycle" is a neat idea but doesn't really map to reality in a way that makes it useful.

What do you propose as a more accurate alternative, or do you think that the whole idea should be scrapped? Because personally I feel like certain tech/practices certainly go through multiple stages, where initially people expect too much from them and eventually figure out what they're good for and what they're not.

Not always a single linear process, like NFTs/crypto refusing to die despite numerous scams out there and projects that seem to go nowhere, yet people still falling for the scams due to promised profits. However, the amount of people critiquing the blockchain as a crappy database seems to suggest at least some lessons learnt along the way and hopefully some actually decent use cases.

niels_bom · on July 26, 2023

I think the more accurate alternative is that reality is messy and you can’t always put things in categories and models.

hypeit · on July 25, 2023

hn yesterday: "astrology is stupid and anyone who believes it should be shunned"

hn today: "where are we on the gartner hype cycle on this one?"

killernap · on July 25, 2023

Gartner are so great at their job you think they own the concept of hype cycles and rage against them being mentioned while being the one to introduce them to the conversation in the first place :)

gardenhedge · on July 25, 2023

Just google 'hype cycles' and it's all about gartner so I think GP is fine with bringing them up.

lolinder · on July 25, 2023

That 8 months seems like a long time to you is indicative of just how fast tech has been moving lately. I expect at least another year before we have a good sense for where we actually are, probably more.

However, I'll hazard a guess: I think we haven't seen many real new apps since then because too many people are focused on packaging ChatGPT for X. A chatbot is a perfectly decent use case for some things, but I think the real progress will come when people stop trying to copy what OpenAI already did and start integrating LLMs in a more hands-off way that's more natural to their domains.

A great example that's changed my life is News Minimalist [0]. They feed all the news from a ton of sources into one of the GPT models and have it rate the story for significance and credibility. Only the highest rated stories make it into the newsletter. It's still rough around the edges, but being able to delegate most of my news consumption has already made a huge difference in my quality of life!

I expect successful and useful applications to fall in a similar vein to News Minimalist. They're not going to turn the world upside down like the hype artists claim, but there is real value to be made if people can start with a real problem instead of just adding a chatbot to everything.

[0] https://www.newsminimalist.com/

killernap · on July 25, 2023

News Minimalist LLM vs News Maximalist LLM, garbage-in, garbage-out. ChatGPT giveth and taketh away. Ouroboros levels of disruption.

mjr00 · on July 25, 2023

> Not one killer app has emerged. I for one am eager to be all hip and open minded and pretend like I use LLMs all the time for everything and they are "the future" but novelty aside it seems like so far we have a demented clippy and some sophomoric arguments about alignment and wrong think.

In my mind I divide LLM usage into two categories, creation and ingestion.

Creation is largely a parlor trick that blew the minds of some people because it was their first exposure to generative AI. Now that some time has passed, most people can pattern match GPT-generated content, especially one without sufficient "prompt engineering" to make it sound less like the default writing style. Nobody is impressed by "write a rap like a pirate" output anymore.

Ingestion is a lot less sexy and hasn't gotten nearly as much attention as creation. This is stuff like "summarize this document." And it's powerful. But people didn't get as hyped up on it because it's something that they felt like a computer was supposed to be able to do: transforming existing data from one format to another isn't revolutionary, after all.

But the world has a lot of unstructured, machine-inaccessible text. Legal documents saved in PDF format, consultant reports in Word, investor pitches in PowerPoint. And when I say "unstructured" I mean "there is data here that it is not easy for a machine to parse."

Being able to toss this stuff into ChatGPT (or other LLM) and prompt with things like "given the following legal document, give me the case number, the names of the lawyers, and the names of the defendants; the output must be JSON with the following schema..." and that save that information into a database is absolutely killer. Right now companies are recruiting armies of interns and contractors to do this sort of work, and it's time-consuming and awful.

tmpz22 · on July 25, 2023

Isn’t the summarization of text like legal documents where the notion of hallucinations come in as a huge blocker?

Is the industry making progress on fixing such hallucinations? Or for that matter the privacy implications of sharing such documents with entities like OpenAI that don’t respect IP?

Until hallucinations and IP/PII are fixed I don’t want this technology anywhere near my legal or personal documents.

famouswaffles · on July 25, 2023

Tasks like summarization and translation get extremely low hallucinations. The more a model "doesn't know" and "has to guess", the more it hallucinates. This isn't much of a problem with what i like to call "morphing" tasks.

>Until hallucinations and IP/PII are fixed I don’t want this technology anywhere near my legal or personal documents.

Good luck with that https://twitter.com/ai__pub/status/1644735555752853504

tmpz22 · on July 25, 2023

> Good luck with that. https://twitter.com/ai__pub/status/1644735555752853504

Is it fair to say these deals claimed to have been closed by the worlds largest law firms using OpenAI backed tooling double check all outputs at their own expense? Could this be a marketing stunt versus a real world usage that actually saved the firm money or time?

mjr00 · on July 25, 2023

I've been using the ChatGPT API to do summarization of text from free-form documents. Not in the legal domain though, so no real regulatory risks. It works very well. I didn't see any hallucinations when spot checking, though of course I can't rule it out. But even if it only gets things 98% correct, that accuracy is good enough for my use case, and being able to programmatically feed these documents in instead of hiring multiple contractors to read through and parse out the data is a massive, massive time and money saver.

> Or for that matter the privacy implications of sharing such documents with entities like OpenAI that don’t respect IP?

Their permissions/organization model is a mess, but ChatGPT does offer the ability to opt out of data collection, at least for corporate accounts.

Legend2440 · on July 25, 2023

Image generators (midjourney, etc) are half LLM and still doing very impressive creation.

The big thing holding them back is legal/copyright concerns, but I expect that will be worked out eventually.

Jevon23 · on July 25, 2023

>Not one killer app has emerged.

Surely the “killer app” is ChatGPT itself?

ChatGPT has already put some copywriters and journalists out of work, or at least reduced their hours. The app is quite literally “killing” something, i.e. people’s jobs. For those people, it’s not just empty hype. It’s very real. Certainly it’s already more real than anything having to do with blockchain/crypto.

Legend2440 · on July 25, 2023

I'm dubious. The few news websites that started publishing LLM articles (CNET, etc) were already circling the drain. They'd probably have fired their journalists anyway because they're on the edge of bankruptcy.

loire280 · on July 25, 2023

I expect that, over the next few years, companies that need to lay off workers will spin their mismanagement by claiming they are replacing those jobs with "AI".

DebtDeflation · on July 25, 2023

The killer app for large enterprises is Q&A against the corporate knowledgebase(s). Big companies have an insane amount of tribal knowledge locked away in documents sitting on Sharepoint, on Box, on file servers, etc. Best case scenario, their employees can do keyword search against a subset of those documents. Chunk those docs, run them through an embedding process, store the embeddings in a vector store, let employees ask questions, do a similarity search against the vector store, pass the top results and the question to the LMM, get an actual answer back to present to the employee. This unlocks a ton of knowledge and can be a massive productivity booster.

adamcblodgett · on July 25, 2023

So the killer app for LLMs and AI in general is...a librarian?

jebarker · on July 25, 2023

Yes! Never in my career have I seen an organization do a good job of organizing institutional knowledge and making it easily available to employees. It'd be a huge benefit to many organizations to be able to ask questions of the collective text holdings.

monero-xmr · on July 25, 2023

There is definitely interesting and high-potential technology here. I do not think the current crop of "wrap ChatGPT in an API for XYZ business-case" startups will succeed - they will be total fails across the board. There is also an issue where anyone with an iota of experience or degree in something tangential to AI or ML can be the "genius" behind a new startup for funding - a telltale sign of bubble mentality to me.

If LLMs in their current form as human-replacement agents are cheaper versions of Fiver / mechanical turks, and we all know there are very limited, bottom-of-the-barrel use cases for those cheap labor technologies, then why would LLMs be a radical improvement? It's nonsensical.

usaar333 · on July 25, 2023

> Not one killer app has emerged

ChatGPT itself is a killer app.

_ea1k · on July 25, 2023

I agree. It has been indispensable for learning tensorflowjs, pytorch, and lots of other things about neural networks.

killernap · on July 25, 2023

About as killer as that twitter clone that was in the news for a minute after forcing people to use it and immediately losing 90% of the captive audience..

They have been losing users. Summer is here, school is out, the kids are back in reality for the moment and apparently when they aren't busy plagiarizing homework the interest is very limited.

danenania · on July 25, 2023

It might not be a killer app for you, but it's a killer app for me as an engineer, and I'm definitely not alone.

To give a concrete example, I used it to write and test a VSCode extension that provides autocomplete and type-checking for environment variables in 46 programming languages[1]. It was the first VSCode extension I've written and I have zero experience in the majority of those languages. The whole project took a little over a week. Without ChatGPT, it would have taken months to add support for so many languages.

1 - https://www.envkey.com/integrations/vscode

throwuwu · on July 25, 2023

lol, now who’s demented? Everyone I know uses it. It even diagnosed a problem with my pool filter among dozens of other uses I find for it. I like it and use it more than Google and stack overflow now. Losing the school crowd for the summer isn’t the beginning of the end, it just means there’s a cohort that doesn’t need it as much for a few months while they’re out having fun instead of stuck inside writing papers and doing math problems.

killernap · on July 25, 2023

It is great that everyone you know uses it but the traffic to ChatGPT is decreasing and has been for over two months now. If pointing this fact out makes me demented consider that perhaps you are emotionally invested in this new toy/brand.

I guess we can wait and see what kind of usage trends will emerge long term. My anecdotal evidence (which is not worth much, same as yours) is that many normies tried it a few times and it was a topic of conversation but is no longer mentioned much.

haswell · on July 25, 2023

> the traffic to ChatGPT is decreasing and has been for over two months now

This seems entirely unsurprising, and isn’t by itself enough to support your general thesis.

Interacting with these LLMs was extremely novel for most people when the tech first dropped, and those earlier months were the peak of the viral growth/expansion into public awareness.

As the novelty dies down, it’s not surprising that there would be less traffic. Early on, I had all sorts of ridiculous conversations just to see what would happen. Now, I only use it when I have some task in mind.

That transition points to this being the opposite of a toy - after the fun dies down, the real work begins.

> My anecdotal evidence…is that many normies tried it a few times and it was a topic of conversation but is no longer mentioned much.

This has not been my experience at all. Most non-technical folks I know who are interested in ChatGPT see value in its ability to expand their technical capabilities/knowledge.

People who are motivated to learn will continue to use this to their advantage.

If some subset of that population has no such interest, this has no bearing on the usefulness of the tech, nor is it representative of the population.

And even if the “normie” population (this is pretty reductive…) abandons it entirely, this again says nothing about the value/utility of LLMs, and hints at a product/market fit issue.

We don’t say programming languages are useless because they’re not adopted by the general public.

jstarfish · on July 25, 2023

> Early on, I had all sorts of ridiculous conversations just to see what would happen. [...] That transition points to this being the opposite of a toy - after the fun dies down, the real work begins.

The "intelligence" behind it is too unpredictable to be reliable for work, and using it for fun is about as amusing as emailing HR.

haswell · on July 26, 2023

> The "intelligence" behind it is too unpredictable to be reliable for work

This highly depends on the kind of work you’re doing. It’s great as a starting point for exploratory learning, helpful for some coding tasks, and useful for summarizing text.

As I work on a writing project that benefits from all of these use cases, it’s a good tool.

Not so great if you’re trying to write legal briefs.

> using it for fun is about as amusing as emailing HR

All due respect, but you’re either doing it wrong, or you’ve encountered some hilarious HR departments.

throwuwu · on July 26, 2023

Ask it to speak in cockney as an 18th century barker trying to convince you to buy a lame horse or to continue the conversation in brolish as though you were two surfer dudes sitting on the beach and then just ask it anything you want like “explain modern monetary theory”. If you enjoy fiction then get it to help world build a new setting and then act out a scene with you playing one character and it playing the rest.

To get it to stay in character use the custom instructions feature to set the requirements.

throwuwu · on July 25, 2023

Sounds more like you have an axe to grind

TigeriusKirk · on July 25, 2023

You asked for examples of impactful uses, you have been provided with some.

Arguing with people who provided what you asked for is a common but unproductive habit.

infinitezest · on July 25, 2023

I personally use copilot every day and I love it. It reduces the amount of typing I have to do, gives me lots of good suggestions for solving simple problems and has made working with unfamiliar languages so much easier.

sanderjd · on July 25, 2023

I'd say we're maybe half or two-thirds of the way down from the peak of inflated expectations toward the trough of disillusionment. Before long, I think maybe in the next three months or so, certainly around the time we hit the one year anniversary of chatgpt's release, we'll start seeing mainstream takes along the lines of "chatgpt and Bing's Sydney episode and such were good entertainment, but it's obvious in hindsight that it was a fad; nobody is posting funny screenshots of their conversations anymore, and all the pronouncements about a superhuman AGI apocalypse were obviously silly, it's clear chatgpt has failed and this whole thing was the same old hype-y SV pointlessness".

And at that point, we will have reached the trough of disillusionment. I think funding will be less readily available, and we'll start seeing some of the bevy of single-purpose LLM-based products start closing up shop.

But more quietly, others will be (already are) traversing up the slope of enlightenment. As others have mentioned, this is stuff like features in Microsoft's and Google's productivity products (including those for software engineering productivity like Github Copilot), and some subset of products and features elsewhere that turn out to be compelling in a sticky way.

I expect 2024 and 2025 to be the more interesting part of this hype cycle. I don't think we're on the verge of waking up in a world nobody recognizes in a small number of days or months, but I think in a few years we're going to have a bunch of useful tools that we didn't have a year ago, some of which are the obvious ones we've already seen, but improved, and others that are not obvious right now.

Not sure if this was insightful enough for you :) Apologies if not.

rm445 · on July 25, 2023

We're still on the exponential rise of the hype cycle. If capabilities appear to plateau - no GPT5/6 that are even more amazing, then the hype will not merely plateau but plummet. For now, anything seems possible.

As for a killer app, I'm another person for whom ChatGPT is it. I use GPT-4 something like Google, Wikipedia and Stack Overflow in one, but being very aware of the limitations. It feels a bit like circa 2000 when being good at googling things felt like a superpower. It doesn't do everything for you but can make you drastically more effective.

There's three levels of what's going on with AI at the moment, each with their own momentum and hype cycle: (1) the current generation of chat bots and image generators, which some of us would be using for the rest of our lives even with only minor refinements; (2) the prospect that new tools built on top of this and subsequent generations could remake the internet and how we interact with our gadgets; and (3) the prospect that the systems will keep getting smarter and smarter.

Mertax · on July 25, 2023

I wonder if language translation will be one of the "killer apps".

Especially if it can be done real-time and according to the context/level of the audience/listener. Even within the same language, translation from a more technical/expert level to a simplified summary helps education/communication/knowledge transfer significantly.

mdp2021 · on July 25, 2023

> summary

That is an intellectual exercise, it requires understanding, and I have not yet seen an LLM implementation that does it properly. If you know one...

What I have seen are outputs that can give the illusion of a properly done job, if the user were willingly (or not) blind to quality.

So: non intellectual translation, we already had tools; intellectually valid one, then we would have much higher opportunities than translation.

KronisLV · on July 25, 2023

> Not one killer app has emerged.

I mentioned the Stack Overflow Developer Survey once already today, but at the risk of sounding like a broken record, it has some data on this as well: https://survey.stackoverflow.co/2023/#ai

To save someone a click, around 44% of the respondents (some 39k out of 89k people) are already using "AI" solutions as a part of their workflow, another 25% (close to 23k people) are planning to do so soon.

The sentiment also seems mostly favorable, most aim to increase productivity or help themselves with learning and just generally knock out some more code, though there is a disconnect between what people want to use AI for (basically everything) and what they currently use it for (mostly just code).

There's also a section on the AI search tools in particular, about 83% of the respondents have at least had a look at ChatGPT, which is about as close to a killer app as you can probably get, even if it's cloud based SaaS: https://survey.stackoverflow.co/2023/#section-most-popular-t...

> Where are we in the hype cycle on this?

I'm not sure about the specifics here, but the trend feels about as significant as Docker and other container technologies more or less taking the industry by storm and changing a bunch of stuff around (to the point where most of my server software is containers).

That said, we're probably still somewhere in the early stages of the hype cycle for AI (the drawbacks like hallucination will really become apparent to many in the following years).

Honestly, the technology itself seems promising for select use cases and it's still nice that we have models that can be self hosted and somehow the software has gotten decent enough that you can play around with reasonably small models on your machine even without a GPU: https://blog.kronis.dev/tutorials/self-hosting-an-ai-llm-cha...

I'm cautiously optimistic about the current forms of LLM/AI, but fear that humanity will misuse the tech (as a cost cutting measure sometimes, without proper human review).

gaganyaan · on July 26, 2023

The killer app is ChatGPT. I'm not sure what you're expecting here, but it's been enormously useful while trying out new languages. For example, even if it's not 100% right, it has been a great help while working with nix, as I'm often ignorant to entire methods of solving a problem, and it's pretty good at suggesting the right method.

It's also super useful for things like "convert this fish shell snippet to bash" or "rewrite this Python class as a single function". It tends to really nail these sorts of grounded questions, and it legitimately saves me time.

reader5000 · on July 25, 2023

I think 8 months is a little short for the utility of a new tech to be fully realized and utilized. I'm pretty sure there were still horses on the roads long after 8 months after the Model T first went on sale.

vhlhvjcov · on July 25, 2023

Horse numbers peaked around 1915. Model T released 1908.

Still plenty around until at least 1930.

http://www.americanequestrian.com/pdf/US-Equine-Demographics...

unethical_ban · on July 25, 2023

I can't tell if this is satire or not. It is so... Well, to be polite, sounds so much like an uninformed stock trader, that I find it hard to believe this isn't some sort of meta commentary on hacker News conversations.

There are plenty examples of where the technology can eventually lead in terms of entertainment, impact on society and news, knowledge work, and so on. It doesn't have to happen immediately. But to handwave The myriad articles about the subject away and just say " I don't believe any of it, what else you got" is a bit annoying.

seydor · on July 25, 2023

Why don't you ask ChatGPT or bard? If there s a hype cycle, it is just starting.

The killer app is the LLM tech itself, and the victim seems to be the whole tech ecosystem. It disintermediates everyone who is gatekeeping information and connects end users with the information they want without the google, the SEO and without ads. Even if we are not right there today, the potential is there. This in itself is huge, since the whole ecosystem of SV is funded by ads.

imglorp · on July 25, 2023

Gatekeepers which can be destroyed by the disintermediation, should be.

StackOverlord · on July 25, 2023

I think it has shown the limitations of the Society of Mind hypothesis. Aggregating individuals equates to aggregating knowledge/experience, not intelligence. This is why hives and anthills do not really surpass their individuals intelligence. Ditto for human societies. In other words: composing LLMs using tools like langchain yield minor improvements over a single LLM instance.

anonyfox · on July 26, 2023

Its not an "AI-killer-app" thats the real deal I think. Its that these AI tools (esp LLMs) are truly powerful tools in everyday work now. Automating stuff is a breeze now whereas it was much more involved before. Data classification, content/code creation, data transformation, ... typical jobs for software engineers boil down to this. Its only a prompt now you fire against an API. Automating tasks that used to require human clerks is now a few hours/days of creative coding and the tasks are gone.

A surprising amount of work can tolerate a percentage of errors in a non-deterministic way, even before considering that humans make even more errors that way usually. :-)

Maxamillion96 · on July 25, 2023

To be extremely cynical, all of this hype seems to be the mid-life crisis of gen-xers who grew up on the jetsons trying to bring the future they saw on tv as children to life, withour regard to the economic or technical feasibility.

(See flying cars and vertical farms as well.)

ttul · on July 25, 2023

The biggest impact on my life has been Code Interpreter. Much of my job as a CEO involves analyzing data to make strategic decisions - “which of several options is best based on the evidence?”

Code Interpreter lets me upload data in a multitude of formats and play with it without wasting hours futzing around in Google Sheets or pulling my hair out with Pandas confusion. I know basic statistics concepts and I studied engineering so I know about signals and systems. But putting that knowledge into practice using data analysis tools is time consuming. Code Interpreter automates the time consuming parts and lets me focus on the exploration, delivering insights I never even had access to before.

SanderNL · on July 25, 2023

I don't think there's a "killer app" coming soon, but it'll be a thousand cuts. One awesome thing here, one slightly less awesome but still useful thing over there. Take Copilot. Cool stuff and one of the early products. Doesn't change the game in any fundamental way, but it does have its impact on the work of a substantial fraction of developers.

This is not unlike the computer revolution itself. When the PC came on the scene it was easy - for some types - to imagine The Future and they proclaimed it loudly. They forgot that the rest of the world take their time and regularly take decades to get used to very minor changes in their routine.

sebmellen · on July 25, 2023

Not to self promote, but all of my thoughts on this are stated here: https://www.sebastianmellen.com/post/2023/the-killer-use-cas....

Since writing that, we’ve started using https://read.ai and other similar tools at my company, and we find them very helpful. I also have a friend working on a large content moderation team that will be using LLaMa 2 for screening comments. Lots of uses!

Thissitesucks9 · on July 26, 2023

There isn't one. AI is pretty much the new blockchain. It is unfair to say it's a scam because it's more of a delusion.

https://en.m.wikipedia.org/wiki/AI_winter

This concept of hype and decline has been happening for literally decades. Yet people don't realize it even when it's literally on the first google page for anything to do with AI.

The people spouting this AI nonsense seriously need to fuck off and read a book.

JuanPosadas · on July 26, 2023

> This article needs to be updated. The reason given is: Add more information about post-2018 developments in artificial intelligence leading to the current AI boom.

I don't quite remember anything existing and comparable over the last few decades to LLMs like ChatGPT/Claude

phillipcarter · on July 25, 2023

We shipped a major feature in our core product atop the API. It's central to our onboarding experience for new users, and works quite well at the job of "teaching" them how to use the product more effectively. It isn't magic, but this has been an inflection point in capabilities.

https://www.honeycomb.io/blog/improving-llms-production-obse...

dTal · on July 26, 2023

An artist friend of mine with no programming knowledge used ChatGPT to produce a variety of cool visuals for a music gig, in Processing - spinning wireframes, bobbing cube grids, that sort of thing. They didn't even know they needed to use Processing at first - ChatGPT told them everything. They had an aesthetic in mind, and ChatGPT helped them deliver.

It's a revolution, and it's here.

killthebuddha · on July 25, 2023

I don't want to make any real assertions but my intuitive reaction to this comment is _this person has no clue what they are talking about_. I would rather turn of syntax highlighting than turn off Copilot and I'd rather disable Google search rather than ChatGPT. And frankly, it's not even close, I use these tools "all the time for everything".

obiefernandez · on July 25, 2023

Not one killer app…

I stopped reading there. So ignorant it’s painful

nathanfig · on July 25, 2023

If you follow the Gartner model, there is usually a surge of high expectations right before a "trough of disillusionment" - but eventually the real applications do emerge. Humans are just impatient.

adventured · on July 25, 2023

That whole Internet thing is never going to work out.

What's the killer app? The Web? CompuServe? AOL chat? ICQ? Blue's News? WebChat Broadcasting? Real Audio? Broadcast.com? GeoCities? It's all ridiculous.

I mean, have you tried to search for anything on AltaVista, Excite or Lycos?!? They hardly work, all you get is page after page of garbage results!

And don't even get me started with eBay or CDNow. Nobody is going to shop online.

_ktqs · on July 25, 2023

This got way out of hand as of by now and isn't about serving humanity as a whole anymore in big parts (!). This is some actors with the money and hardware trying to build their AI dream castles up on the shoulders of the rest and even don't care what the implications of their actions are. Money is regulating this business and is taking more away from us all in the long term than it pays in the short. I'm kinda glad we're developing backwards, because this changes are necessary for building a balanced future for all of us. Not just for a few eligible...

edit: I'm not in for a discussion.

mdp2021 · on July 25, 2023

> not in for a discussion

?

carlossouza · on July 25, 2023

Isn’t Copilot a killer app?!

jawerty · on July 25, 2023

I run through a lot of these concepts, specifically RLHF, in my latest coding stream where I finetune LLama 2 if anyone's interested in getting a LLM deep dive https://www.youtube.com/watch?v=TYgtG2Th6fI&t=4002s

Long story short, the size of the model and reward mechanisms used in validating off of human annotating/feedback are the main differences between what we can do as independents in OSS vs OpenAI. BigCode's StarCoder (https://huggingface.co/bigcode/starcoder) has some human labor backing it (I believe correct me if I'm wrong) but at the end of the day a company will always be able to gather people better.

Not knocking Starcoder, in fact I streamed how to fine tune it the other day. However, it's important to mention some of the limitations in the OSS space now (big reason Meta pushing LLama 2 is a nice to have)

bilsbie · on July 25, 2023

When you’re doing RLHF are you actually modifying the weights of llama itself?

Or is something on top?

RC_ITR · on July 25, 2023

RLHF does change the parameters.

The way to think about it is that backpropagation changes the parameters of a model so they get closer to some sort of desired output.

In pre-training and SFT, the parameters are changed so the model does a better job of replicating the next word in the training data, given the words it has already seen.

In RLHF, the parameters are changed so the model does a better job of outputting the response that aligns to the human's preference (see: the feedback screen in the linked article).

bilsbie · on July 25, 2023

Thanks. That helps.

So how can you update weights without doing back-propagation? Or is it still back propagation but with a different metric?

RC_ITR · on July 25, 2023

Both do backpropagation, the difference is what you are backpropagating towards.

Think of it this way - there are an equal number of rude and polite comments online (actually probably way more rude ones).

If a model is trained on that data, how do you get it to only respond politely?

You could filter out the rude comments, but that's expensive and those rude comments may still have other helpful patterns that tech your model other stuff.

Alternatively, you could pre-train on the rude comments, but then after pre-training is done, you hire a ton of people in a low cost geo and ask them 'do you prefer comment 1 (a polite output of the pre-trained model) or comment 2 (a rude output).'

The model then 'learns' that comment 1 is better because it gets more votes, and adjusts parameter's (through backpropagation) to make comment 1 instead of comment 2

In practice, you can't control what the model outputs, so you just ask it to give you it's top N responses and the humans rank all of them, hoping you get a decent mix of rude and polite.

bradfox2 · on July 25, 2023

It's still loss being backproped, but the loss is calculated over a different criteria

bilsbie · on July 25, 2023

Ok that makes a lot of sense.

Why do they call it reinforcement learning then? Is it not traditional RE such as Q learning?

dgant · on July 25, 2023

The distinction making it RL is that the model is training on data produced by the model itself.

The benefit of RL in general is that you're training on states the agent is likely to find itself in, and the cost is needing an agent which explores salient states. Which is why we keep seeing RL as a finishing step after imitation (eg AlphaStar first learning StarCraft from replays)

bradfox2 · on July 27, 2023

LLM output is scored by another model that produces a reward for the entire sequence emitted by the LLM. The reward model is trained on human preferences or some other metric usually. It's RL because we train on the reward and not some language modeling objective.

The LLM is trained to increase this reward score (or minimize the inverse), which is what makes it RL.

samstave · on July 25, 2023

This implies that any RLHF is introducing human bias into any "thoughts" the model may have?

RC_ITR · on July 25, 2023

Yes, but I think your comment has the foundational misconception that it's the first or even main place where bias is put into models.

LLMs are just pattern identifiers and repeaters. They are trained on inherently biased training datasets of inherently biased text written by inherently biased humans. Every single step of training introduces some amount of bias to an LLM.

jawerty · on July 25, 2023

So I'm not doing RLHF that's how LLama is pre-trained. It's in the loss/optimization phase in their training I believe.

For the finetuning i'm using LoRA to freeze most of the layers for parameter optimization. Using PEFT from huggingface

hallqv · on July 25, 2023

RLHF is not part of LLaMa pretraining, or pretraning of any other models for that matter. RLHF comes after pretraining. https://twitter.com/Jeande_d/status/1661833563069620247/phot...

jsmith45 · on July 25, 2023

Seems like a classic case of a term of art overlapping with normal English terminology.

Knowing that you will be doing further training on a provided model (even "just" extensive fine-tuning), one would want to distinguish the training done before you get your hands on it, from the training you do. An obvious word for that previous training is pre-training, which unfortunately conflicts with a term of art.

jawerty · on July 25, 2023

I see, that’s my misunderstanding I was grouping all training as pretraining

wilhelm____ · on July 26, 2023

pre-training is developing the language model's base understanding of conditional word probabilities.

SFT and RLHF is attempting to further guide the model in terms of steerability + alignment of output.

In fact, the InstructGPT authors were worried about losing the pre-trained model's underlying probability distribution, so they try a version where it penalizes the model deviating too significantly from the original distribution (using KL). I don't remember them seeing a significant difference in performance.

jannyfer · on July 25, 2023

Great summary.

I’ve been reading a pop neuroscience book called Incognito (2011).

In it, the author talks about how the brain is a group of competing sub-brains of many forms, and the brain might have several ways of doing the same thing (e.g. recognizing an object). The author also posited that the lack of AI progress back then was due to the fact that there are no constantly competing sub-brains. Our brains are always adjusting and trying new scenarios.

I was struck by how similar these brain observations were to recent developments in AI and LLMs.

The book is full of cool stories, even if some of them are now recognized as non-reproducible. I recommend!

go_elmo · on July 25, 2023

In the end - an AI should have these competing subsystems in one system - just as our brains are one system. What I find extremely interesting is how perception and thinking differs from person to person too - it was a "taboo" topic to call this neurodiversity - just as other genetic traits, but AI makes this relevant more than ever imo. Sure, its complicated and much comes from nurture (Nurture vs nature.. as exposure/epigeneticd vs genetics) but there sure are markable differences - the ones starting to stand out are e.g. adhd / autistic people, but Im sure it wont stay just there over time!

samwillis · on July 25, 2023

You touch in an important topic here, how our understanding of AI/ML/LLMs will influence our "understanding" of the human brain and intelligence.

My fear is that we will ascribe too much human behaviour to that we see in and understand of our AI inventions, and that this will result in the dehumanisation of people.

So essentially my fear is what we justify doing to each other due to AI, rather than what "AGI" could do to us.

lacrimacida · on July 28, 2023

I started seeing this dehumanisation spring up even here on HN, comparing LLMs with human brains and human thoughts as similar regurgitation to LLMs. I’m afraid it will get worse as this technology advances

simonh · on July 25, 2023

Even within my immediate family we seem to have distinct differences in our conscious experience. My wife has very little visual or auditory experience of thought, no inner voice even when reading a book. While I mostly experience speaking as a continuous stream of words coming basically from my subconscious, with only a vague sense of what's coming up, one of my daughters says she is consciously aware of the exact words she is going to say several seconds in advance. It's like she has the ability to introspect her internal speech buffer, while I can't.

So while I'm sure there are a lot of custom tuned, problem specific hardware structures in our brain architecture, we do seem to learn how to actually use that hardware individually. As a result we seem to come up with a diverse range of different high level approaches.

esafak · on July 25, 2023

> The author also posited that the lack of AI progress back then was due to the fact that there are no constantly competing sub-brains.

That became popular in neural networks after the introduction of dropout regularization, which forced neurons to "co-adapt" and learn to do each others' jobs. Large, over-specified models also provide a natural setting for co-adaptation.

moffkalast · on July 25, 2023

Isn't dropout just there to avoid overfitting? This is more like a mixture of experts type architecture.

esafak · on July 26, 2023

That is one lens to view it through. Co-adaptation reduction is another, and it is an intuitive one: generalization ability is improved if a neuron has to support multiple contexts instead of relying on other neurons to lift the weight, if you pardon the pun.

Improving neural networks by preventing co-adaptation of feature detectors https://arxiv.org/abs/1207.0580

masswerk · on July 26, 2023

In fact, this is what psychoanalysis and the notion of the unconscious (as opposed to "subconscious processes") was all about. (And it's also, where the "talking cure" found its leverage.)

wilhelm____ · on July 26, 2023

thank you!

throwuwu · on July 25, 2023

How are they similar?

cubefox · on July 25, 2023

Specifically about RLHF, I find this video by Rob Miles still the best presentation of the ingenious original 2017(!) paper: https://youtube.com/watch?v=PYylPRX6z4Q

RLHF is actually older than GPT-1, which came out in 2018. It didn't get applied to language models until 2022 with InstructGPT, an approach which combined supervised instruction fine-tuning with RLHF.

getmeinrn · on July 25, 2023

How do you do science on LLMs? I would imagine that is super important, given their broad impact on the social fabric. But they're non-deterministic, very expensive to train, and subjective. I understand we have some benchmarks for roughly understanding a model's competence. But is there any work in the area of understanding, through repeatable experiments, why LLMs behave how they do?

Do we care?

api · on July 25, 2023

I'm pretty much certain the cost of training and running large LLMs is going to come down, because it's only a matter of time before truly customized chips come out for these.

GPUs really aren't that. They're massively parallel vector processors that turn out to be generally better than CPUs at running these models, but they're still not the ideal chip for running LLMs. That would be a large even more specialized parallel processor where almost all the silicon is dedicated to running exactly the types of operations used in large LLMs and that natively supports quantization formats such as those found in the ggml/llama.cpp world. Being able to natively run and train on those formats would allow gigantic 100B+ models to be run with more reasonable amounts of RAM and at a higher speed due to memory bandwidth constraints.

These chips, when they arrive, will be a lot cheaper than GPUs when compared in dollars per LLM performance. They'll be available for rent in the cloud and for purchase as accelerators.

I'd be utterly shocked if lots of chip companies don't have projects working on these chips, since at this point it's clear that LLMs are going to become a permanent fixture of computing.

DanHulton · on July 25, 2023

I feel like it took practically no time for custom ASICs for bitcoin mining to show up, as soon as it was determined there was real money involved.

Given that there's already definitely real money involved here, I wonder what's holding up the custom AI ASICs?

api · on July 25, 2023

It takes a while to design an ASIC and it hasn't been that long since the hype wave really arrived for these things. I would bet on LLM chips showing up in 2024-2025.

There's also a run on foundries right now which might delay things further. The new foundries in the US being built under the Chips act in Arizona and Ohio won't be online until probably 2025.

DanHulton · on July 25, 2023

Ah that's an excellent point about the foundries, I hadn't considered that. Also, in reflection, I'm being pretty handwavy about the "as soon as it was determined there was real money involved."

Honestly, I'd probably be pretty unsurprised to learn if the time difference between "big commercial interest" and "custom chips appear" is probably pretty similar for both cases.

api · on July 25, 2023

I recall it taking years for BTC ASICS that were generally available.

gjm11 · on July 25, 2023

I would imagine it's a bit like doing science on human beings, who are also non-deterministic, expensive to train, and subjective. Perhaps there's scope for a scientific discipline corresponding to psychology but concerned with AI systems. We could call it robopsychology.

scottydog51834 · on July 25, 2023

There's a field called Interpretability (sometimes "Mechanistic Interpretability") which researches how weights inside of a neural network function. From what I can tell, Anthropic has the largest team working on this [0]. OpenAI has a small team inside their SuperAlignment org working on this. Alphabet has at least one team on this (not sure if this is Deepmind or Deepmind-Google or just Google). There are a handful of professors, PhD students, and independent researchers working on this (myself included); also, there are a few small labs working on this.

At least half of this interest overlaps with Effective Altruism's fears that AI could one day cause considerable harm to the human race. Some researchers and labs are funded by EA charities such as Long Term Future Fund and Open Philanthropy.

There is the occasional hackathon on Interpretability [1].

Here's an overview talk about it by one of the most-known researchers in the field [2].

[0] https://transformer-circuits.pub/2021/framework/index.html [1] https://alignmentjam.com/jam/interpretability [2] https://drive.google.com/file/d/1hwjAK3lWnDRBtbk3yLFL2DCK1Dg...

scottydog51834 · on July 25, 2023

Some people (namely the EAs) care because they don't want AI to kill us.

Another reason is to understand how our models make important decisions. If we one day use models to help make medical diagnoses or loan decisions, we'd like to know why the decision was made to ensure accuracy and/or fairness.

Others care because understanding models could allow us to build better models.

ShamelessC · on July 26, 2023

> At least half of this interest overlaps with Effective Altruism's fears that AI could one day cause considerable harm to the human race.

That’s a little depressing.

swyx · on July 25, 2023

> Transformers can be generally categorized into one of three categories: “encoder only” (a la BERT); “decoder only” (a la GPT); and having an “encoder-decoder” architecture (a la T5). Although all of these architectures can be rigged for a broad range of tasks (e.g. classification, translation, etc), encoders are thought to be useful for tasks where the entire sequence needs to be understood (such as sentiment classification), whereas decoders are thought to be useful for tasks where text needs to be completed (such as completing a sentence). Encoder-decoder architectures can be applied to a variety of problems, but are most famously associated with language translation.

theres a whole lot of "thought to be"'s here. is there a proper study done on the relative effectiveness of encoder only vs decoder only vs encoder-decoder for various tasks?

dsubburam · on July 25, 2023

'Formal Algorithms for Transformers'[1] is a proper account of the architectures and what tasks they naturally lend themselves to, by authors from DeepMind. See sections 3 (Transformers and Typical Tasks) and 6 (Transformer Architectures).

Not much on empirical observations, though.

[1]https://arxiv.org/abs/2207.09238

swyx · on July 25, 2023

pseudonom- · on July 25, 2023

There's some discussion in section 3.2 of https://arxiv.org/pdf/1910.10683.pdf

swyx · on July 25, 2023

tmaly · on July 25, 2023

>Rather than explicitly labeling data, it might be easier for a human to read two or more LLM outputs and encode their preferences through comparison.

This reminded me a lot of what economist Murray Rothbard talked about on preferences in his treatise Man, Economy, and State.

There is likely to be other insights hidden in these philosophical works on human choices.

samstave · on July 25, 2023

Subscribe.

ELI5 - I'd like to know more about this, as I have no experience with this line of thought as of yet.

blitzar · on July 25, 2023

You had me at "Crypto VCs & ”builders” making a hard left into AI"

dsco · on July 25, 2023

As a matter of fact, there’s even more developers making a hard left into AI who have never touched crypto.

The interesting follow up question is: what will they actually spend time on? Training new models? Copy pasting front ends on ChatGPT? Fine tuning models?

I think many of them will be scared by how much of a hard science ML is vs just spinning up old CRUD apps

blitzar · on July 25, 2023

> Training new models? Copy pasting front ends on ChatGPT? Fine tuning models?

The stable diffusion community is probably 2 years more mature than the GPT, there we see gui tools of a kind (in colab notebooks) to abstract away from code and thenlots of fine tuning.

On the professional side, adobe have plugged these tools into their products. https://www.adobe.com/sensei/generative-ai/firefly.html

zirgs · on July 25, 2023

It's a lot easier to run Stable Diffusion locally. Meanwhile only the dumbest LLMs work on ordinary consumer GPUs. Datacenter GPUs with 80 GB vram are ridiculously expensive.

brigadier132 · on July 25, 2023

Have you used an llm as an application developer? The hard tech for generative ai will be commodified. The products built on top will not.

tudorw · on July 25, 2023

As an enthusiast delving in for the pure pleasure of it I see a future ahead where as a programmer, I will have an array of options to use LLM's in hybrid systems. I'm not looking for an AGI, rather a mixture of experts that I can remix as I see fit, each one tuned for a subset of 'intelligence' that can be wielded with relative precision as part of a larger system that combines traditional programming with the new abilities offered by LLMS's and their kin. Certainly an interesting time to dip back in, there are aspects of the domain that mirror my experiences with the early web and for those I am grateful.

_ea1k · on July 25, 2023

It does feel a bit like the internet era just before the Netscape IPO. I'm looking forward to the pets.com of AI.

Maybe openai will go the way of Netscape and make it all FOSS eventually. That'd be nice.

Zach_the_Lizard · on July 25, 2023

> Training new models?

There is value in applying old techniques to new problems. Training a model to, I don't know, recognize snake species might help save snake bite victims lives.

(This is an example I came up with in 5 seconds, please don't take it seriously)

But there's also the whole "sell the shovel" aspect; it can be hard to train models. It can be hard to interpret the quality of the results. How do I know version 2 of the model is better than version 1? How do I even get labeled photos of snakes and not-snakes?

I suspect solving some of those problems are where some of the real gold is buried.

washadjeffmad · on July 25, 2023

I'd imagine something like openBB / BB Terminal with consolidated API access for financial reporting, a platform for insider communities ("chat, forums, and an app!"), etc. Make it a club, and it'll sell itself.

Since investment has been demoed successfully with off the shelf models, I don't think we're waiting on big advancements to be able to build a product. The bar for something like this, short term, is 1) be cool and 2) lose less money than traditional investing, sometimes.

wilhelm____ · on July 26, 2023

haha ty ty!

JazzyEssy · on July 25, 2023

Oh really now lol

janalsncm · on July 26, 2023

> Transformers can read the entire sequence at once and learn to “pay attention to” only the values that came earlier in time (via “masking”)

Unless the text fits into the model’s context window, this is incorrect. The self-attention layers will train via a sliding window over the text.

Learning to attend only to previous tokens is also not correct. There are a lot of ways to train a transformer. BERT is bidirectional for example.

wilhelm____ · on July 26, 2023

I mention that above this sentence in the image - glad someone was "paying attention to" the text ;)

yding · on July 25, 2023

Good job Will!

wilhelm____ · on July 26, 2023

thank you, Yi!

_fbpp · on July 25, 2023

Given a set of instructions, an instruction fine-tuned/aligned LLM is able (conditional on size and training quality) to reason through a set of steps to produce a desired output.

This is plainly wrong. The model's growing size makes it better at guessing the outcome of a reasoning task, but little to no actual reasoning is performed.

It's trivial to prove this as well, as LLMs will still fail miserably at (larger) math problems that even basic computer algebra systems will handle with ease.

viraptor · on July 25, 2023

> The model's growing size makes it better at guessing the outcome of a reasoning task, but little to no actual reasoning is performed.

If there's no observable difference between the behaviours, why not call it as the post did?

> LLMs will still fail miserably at (larger) math problems

They're neither trained on such problems, nor is that a goal for LLMs. They can however tell you how to convert that problem into steps that can be run in an algebra system.

_fbpp · on July 25, 2023

You're missing the point, there is a difference; The answers are often wrong, and more-wrong the more complex the question gets.

They're only able to answer simple (relative-to-the-model's-size) straightforward reasoning questions. Which is a nice party trick, but not broadly useful.

They can however tell you how to convert that problem into steps that can be run in an algebra system.

Usually they can't do that very well either. Converting a problem from one description to another is algebraic reasoning, subject to the issues already mentioned.

What they can do is summarize general instructions and documentation, provided adequate training data was available.

They're neither trained on such problems, nor is that a goal for LLMs

Yes. But LLMs keep being pushed for tasks that heavily involve abstract reasoning, which is dangerous as they're unsuited for it. (E.g. Any code generation that isn't mere empty boilerplate.)

flangola7 · on July 25, 2023

OpenAI is experimenting with process supervision, which corrects many of these issues:

Problem Let x8+3x4−4=p1(x)p2(x)⋯pk(x),x8+3x4−4=p1(x)p2(x)⋯pk(x), where each non-constant polynomial pi(x)pi(x) is monic with integer coefficients, and cannot be factored further over the integers. Compute p1(1)+p2(1)+⋯+pk(1).p1(1)+p2(1)+⋯+pk(1).

Model attempt

Here, GPT-4 successfully performs a complex series of polynomial factorizations. The use of the Sophie-Germain identity in step 5 is an important step that could be considered insightful.

1

I notice that the given polynomial has even degree and only even powers of xx, so I can try to make a substitution to simplify it.

2

Let y=x4y=x4, then the polynomial becomes y2+3y−4y2+3y−4, which is a quadratic equation.

3

I can factor this quadratic equation as (y+4)(y−1)(y+4)(y−1), so the original polynomial is (x4+4)(x4−1)(x4+4)(x4−1).

4

Now I need to factor each of these factors further, if possible, over the integers.

5

For the first factor, x4+4x4+4, I recall the Sophie Germain identity, which says that a4+4b4=(a2+2b2+2ab)(a2+2b2−2ab)a4+4b4=(a2+2b2+2ab)(a2+2b2−2ab).

6

Applying this identity with a=xa=x and b=1b=1, I get x4+4=(x2+2x+2)(x2−2x+2)x4+4=(x2+2x+2)(x2−2x+2).

7

For the second factor, x4−1x4−1, I can use the difference of squares formula, which says that a2−b2=(a+b)(a−b)a2−b2=(a+b)(a−b).

8

Applying this formula with a=x2a=x2 and b=1b=1, I get x4−1=(x2+1)(x2−1)x4−1=(x2+1)(x2−1).

9

Then I can apply the difference of squares formula again with a=xa=x and b=1b=1, to get x2−1=(x+1)(x−1)x2−1=(x+1)(x−1).

10

So the original polynomial is (x4+4)(x4−1)=(x2+2x+2)(x2−2x+2)(x2+1)(x+1)(x−1)(x4+4)(x4−1)=(x2+2x+2)(x2−2x+2)(x2+1)(x+1)(x−1).

11

These are all the irreducible factors over the integers, so p1(x)=x2+2x+2p1(x)=x2+2x+2, p2(x)=x2−2x+2p2(x)=x2−2x+2, p3(x)=x2+1p3(x)=x2+1, p4(x)=x+1p4(x)=x+1, and p5(x)=x−1p5(x)=x−1.

12

To find the sum of their values at x=1x=1, I just need to plug in x=1x=1 into each factor and add them up.

13

I get p1(1)=12+2⋅1+2=5p1(1)=12+2⋅1+2=5, p2(1)=12−2⋅1+2=1p2(1)=12−2⋅1+2=1, p3(1)=12+1=2p3(1)=12+1=2, p4(1)=1+1=2p4(1)=1+1=2, and p5(1)=1−1=0p5(1)=1−1=0.

14

So the final answer is 5+1+2+2+0=105+1+2+2+0=10.

15

Answer: 10

_fbpp · on July 25, 2023

It's an impressive result, but shouldn't be seen as "correction". Framing it as a (drastic) reduction in mistakes is more useful here.

If the model is productionized (read: dumbed down so it isn't as expensive to run), the reasoning abilities drastically decline again.

And these reasoning abilities are still around a language model, rather than around abstract models.

This is a very effective party trick for general math, whose language quite directly maps onto these abstract concepts, but there are some holes. Information about e.g. which values may be zero isn't encoded in the language, and so this approach is liable to blundering around division-by-zero issues.

If you want a particular example to toy around with, LLMs are not fond of quaternions and their conversion to other representations.

taylodl · on July 25, 2023

Which means they're this close in being able to reach out to an algebra system and run the steps and return you the result. I was just talking about this problem with someone the other day - how can it recognize that it doesn't have the answer but knows where it can get data so that it can form an answer. This seems to be the path Google is taking.

JustBreath · on July 25, 2023

There's some argument to be made that a form of reasoning happens in a roundabout way when the AI is told to explain it's reasoning.

For example if you tell it "Do <thing>" and then open a new context and say "Do <thing>, explain your reasoning beforehand." you will often get a more accurate response.

Granted, it's not that any "Hmm, let me think about that." Deep Thought reasoning occurs, but simply that predicting what the reasoning would look like and then predicting what comes after that reasoning results in a more accurate - and ironically, reasoned - response.

Kinda funny actually, it's a bit like how in Hitchiker's Guide they just had to tell the probability machine to calculate the odds of an improbability drive in order to create it.

_fbpp · on July 25, 2023

This is where the terminology becomes a bit annoying, but there is a key difference in the kinds of reasoning at work here.

When you ask LLMs to provide a reasoning, the actual reasoning performed is linguistic; The LLM has (is) a model about language and performs some (limited) reasoning on that model to get an output.

But that is explicitly different from reasoning about the abstract question at hand, thus the answer is mostly a guess.

The key difference to observe is that "semantic reasoners" like computer algebra or prolog, always maintain correctness within the axioms provided. They may slow down significantly as questions get more complex, but they do not start providing wrong answers. Computers are flawless mathematicians, provided they are programmed correctly.

LLMs do provide increasingly more-wrong answers as the question gets more complex. Thus we can observe that LLMs do not abstractly reason about the question and it's model.

famouswaffles · on July 25, 2023

>Thus we can observe that LLMs do not abstractly reason about the question and it's model.

Your conclusion makes no sense. Humans provide increasingly wrong answers as questions get more complex too. Jumping from that to "incapable of abstract reasoning" is silly. You have not "trivially proven" anything at all

>The LLM has (is) a model about language and performs some (limited) reasoning on that model to get an output.

LLMs generalize to non linguistic patterns.

https://general-pattern-machines.github.io/

_fbpp · on July 25, 2023

Humans provide increasingly wrong answers as questions get more complex too.

Human this, Human that. LLMs aren't humans. "My model is crap but the human brain isn't very good at this either" is irrelevant when we have machines that are not only very good at these tasks but almost perfect at them.

Humans make such mistakes precisely because they are not perfect reasoning machines. To compare LLMs to humans is not only disingenuous, but proves my point.

(And no, I will not humour you with an argument about how the amount of wrong answers is drastically lower from human mathematicians)

Jumping from that to "incapable of abstract reasoning" is silly.

They are language models. It is explicitly what they are designed to do.

If these LLMs are not, as I claim, reasoning on language rather than the abstract model of the query, then how come they fail miserably in exactly the ways you would expect where that the case?

LLMs generalize to non linguistic patterns.

Yes, congratulations, if you turn a problem into a linguistic one LLMs can deal with them. This does not in any way go against what I said about the capabilities of LLMs.

The same levels of actual abstract reasoning can be achieved on a graphing calculator running off literal potatoes.

famouswaffles · on July 25, 2023

>Human this, Human that. LLMs aren't humans.

You said you trivially proved something and made up nonsensical lines of reasoning to justify it. If your "proof" can't port to Humans then it's not proof. You are just rambling.

>Humans make such mistakes precisely because they are not perfect reasoning machines.

Nobody is calling LLMs perfect reasoning machines. Your "point" was that they don't reason at all which of none of your ramblings has been able to "prove".

>If these LLMs are not, as I claim, reasoning on language rather than the abstract model of the query, then how come they fail miserably in exactly the ways you would expect where that the case?

They don't. The idea that you must make no mistake reasoning before you can be considered to be reasoning has no ground.

>LLMs generalize to non linguistic patterns. Yes, congratulations, if you turn a problem into a linguistic one LLMs can deal with them.

Can you read ? Did you even bother looking at the link? LLMs don't need patterns to be linguistic to reason over them lol. None of those patterns are turned linguistic. Some of them are arbitrary numbers that resemble nothing like the data they've been trained on.

_fbpp · on July 25, 2023

If your "proof" can't port to Humans then it's not proof

Learn to take a hint. I'm not going to argue this on human terms because you're playing a dumb um-akshually game.

Computer reasoning systems can solve vastly more complex problems perfectly. Expert mathematicians can solve vastly more complex problems with only minimally increased errors. The ability of LLMs to solve reasoning problems completely disintegrates when the problems get more complex.

Trying to argue that LLMs are alike humans because of you can put these three into the buckets of "No mistakes" and "Some mistakes" is ridiculous.

Nobody is calling LLMs perfect reasoning machines.

Yes.

You said humans make mistakes, my point here is, humans make mistakes precisely because they stop doing reasoning and start doing blind pattern matching estimation of the answer.

The idea that you must make no mistake reasoning before you can be considered to be reasoning has no ground.

Reading comprehension.

I did not say no mistakes. I said that the failure pattern follows that of estimated guesses; Rapidly increasing errors as the size of the problem increases.

Whereas with computer reasoning, the rate of errors does not increase at all. And with (expert) humans the rate only goes up a little.

Did you even bother looking at the link?

You are missing the point.

I am not referring to literally English or any other language. I'm referring to the structure of language problems, which is vastly simpler than any moderately complex math or programming problem.

To more explicitly spell out the reason for my unimpressed-ness: They trained a pattern-repeating-machine and found that it will repeat some of their patterns, some of which were patterns trained on.

This does not demonstrate the ability to reason abstractly about new models, so I do not care.

JustBreath · on July 25, 2023

Seems like a blurry line between "reason" and "guessing."

Kind of like how an educated guess by a professional is often more accurate than a well reasoned opinion of a layman.

The professional may not have reasoned it so much as intuited, but within that intuition is a lot of wisdom.

I suppose "predicting" is a more precise word than guessing or reasoning.

Guessing implies an arbitrary nature, reasoning implies understanding the concepts at some level.

llm_nerd · on July 25, 2023

"Crypto VCs & ”builders” making a hard left into AI"

This is a humorous intro graphic caption, but this sentiment appears on here constantly and it's self-destructive. This response might seem a bit over the top to a funny graphic, but I am replying to the general "ha ha AI like crypto amirite?" sentiment that is incredibly boring and worn out.

When confronted with challenging new technology that we don't understand, some knee-jerk to acting dismissive. As if that has any hope at all of changing outcomes.

It's especially weird when people who are clearly on the "I must desperately learn this as quickly as I can and try to present myself as some sort of expert" still incant the rhetoric -- "joking on the square" as it were -- as if they need to defend their prior dismissals. Constantly on here there is yet another trivial "intro to tokenization" blog entry that brays some tired crypto comparison.

Stop it.

The Venn diagram of people at the forefront of ML/LLM, and its advocates, is almost entirely separate from the web/crypto sphere. There is astonishingly little overlap. Crypto was hyped because some people truly saw a purpose, coupled with masses of scammers and getrichquick sorts. AI/LLM/ML is hyped because it is revolutionary and has already yielded infinitely more practical impact than crypto ever did.

NhanH · on July 25, 2023

> The Venn diagram of people at the forefront of ML/LLM, and its advocates, is almost entirely separate from the web/crypto sphere. There is astonishingly little overlap

That statement seems false. Especially since this was a headline I saw in half a dozen online news in my country yesterday.

> OpenAI's Sam Altman launches Worldcoin crypto project[0]

If anything, even without taking into account the greed stuff, people who are drawn to fun tech is likely to be drawn to both LLM and cryptocurrency stuffs.

[0]https://www.reuters.com/technology/openais-sam-altman-launch...

llm_nerd · on July 25, 2023

Me - "Little overlap"

You - "That's false because I saw a thing about this one guy doing some thing"

Okay...

"even without taking into account the greed stuff,"

What "greed stuff"? It is incredibly hard to make money in LLMs/ML. The barriers to entry are colossal, and it is technically extremely difficult. Everyone keeps talking about all the "grifters" (a go to term that usually means the speaker's arguments can be dismissed) yet there are very, very few people making money in AI. The biggest money maker in AI is nvidia and some cloud providers.

You can't just fork BTC or create a Ethereum contract and spin off another shitcoin and make free money. You can't do a rug pull. You can try to create an incredibly difficult niche solution and yield some excitement, but that's like all of technology ever. Comparing it with crypto is dumb.

"people who are drawn to fun tech is likely to be drawn to both LLM and cryptocurrency stuffs"

Loads of people who like to know what's up became acquainted with both. Sure. Understanding tech makes sense. And a lot of us learned crypto, realized it had extraordinarily little real-world utility or benefit, and moved on.

obiefernandez · on July 25, 2023

I can assure you that the smart folks running daily AI newsletters are currently raking in money hand over fist. It’s awe inspiring.

llm_nerd · on July 25, 2023

Such as? Can you give some examples?

There are some very smart people working incredibly hard, yielding some rewards from it. It is a tiny, tiny set.

obiefernandez · on Aug 6, 2023

Recently paid 500 usd to TheresAnAIforThat for one link in their newsletter. They feature at least 5-10 per day. Decent money for a one man operation that’s “generated by AI”

uhtred · on July 25, 2023

I think when the dust settles we are just going to have some chat bots and a few rich grifters.

wilhelm____ · on July 26, 2023

On the contrary, I agree - while there is certainly hype being generated around AI, particularly generated by the "VC hype cycle", the fundamental advancements we've made with LLMs are quite real.

Part of the reason I wrote this is to separate the signal from the noise and why one should be {cautiously, more tempered} optimistic in the medium term.

Lich · on July 25, 2023

> "ha ha AI like crypto amirite?"

I don’t think that was the meaning at all. I think the image was supposed to convey how the crypto grifters and con artists were veering into AI to run scams under the guise of AI.

wilhelm____ · on July 26, 2023

^^ yes