A fundamental (but sadly common) error behind “tokens are units of thinking” is antropomorphising the model as a thinking being. That’s a pretty wild claim that requires a lot of proof, and possibly solving the hard problem, before it can be taken seriously.
There’s a less magical model of how LLMs work: they are essentially fancy autocomplete engines.
Most of us probably have an intuition that the more you give an autocomplete, the better results it will yield. However, does this extend to output of the autocomplete—i.e. the more tokens it uses for the result, the better?
It could well be true in context of chain of thought[0] models, in the sense that the output of a preceding autocomplete step is then fed as input to the next autocomplete step, and therefore would yield better results in the end. In other words, with this intuition, if caveman speak is applied early enough in the chain, it would indeed hamper the quality of the end result; and if it is applied later, it would not really save that many tokens.
Willing to be corrected by someone more familiar with NN architecture, of course.
[0] I can see “thinking” used as a term of art, distinct from its regular meaning, when discussing “chain of thought” models; sort of like what “learning” is in “machine learning”.
IMO "thinking" here means "computation", like running matrix multiplications. Another view could be: "thinking" means "producing tokens". This doesn't require any proof because it's literally what the models do.
As I understand it, the claim is: more tokens = more computation = more "thinking" => answer probably better.
I don't agree with GP's take on anthropomorphising[0], but in this particular discussion, I meant something even simpler by "thinking" - imagine it more like manually stepping a CPU, or powering a machine by turning a crank. Each output token is kinda like a clock signal, or a full crank turn. There's lots of highly complex stuff happening inside the CPU/machine - circuits switching/gears turning - but there's a limit of how much of it can happen in a single cycle.
Say that limit is X. This means if your problem fundamentally requires at least Y compute to be solved, your machine will never give you a reliable answer in less than ceil(Y/N) steps.
LLMs are like this - a loop is programmed to step the CPU/turn the crank until the machine emits a magic "stop" token. So in this sense, asking an LLM to be concise means reducing the number of compute it can perform, and if you insist on it too much, it may stop so early as to fundamentally have been unable to solve the problem in computational space allotted.
This perspective requires no assumptions about "thinking" or anything human-like happening inside - it follows just from time and energy being finite :).
--
[0] - I strongly think the industry is doing a huge disservice avoiding to anthropomorphize LLMs, as treating them as "little people on a chip" is the best high-level model we have for understanding their failure modes and role in larger computing systems - and instead, we just have tons of people wasting their collective efforts trying to fix "lethal trifecta" as if it was a software bug and not fundamental property of what makes LLM interesting. Already wrote more on it in this thread, so I'll stop here.
— middle-aged people alive today experienced a 35% increase in average ambient atmospheric carbon dioxide concentration within their lifetimes[0], and
— ambient atmospheric carbon dioxide concentration today has apparently never been this high since Miocene[1] (15 megayears ago). It blew past the last relative peak, from around 300 kiloyears ago, around World War I and Russian Revolution[2] and is skyrocketing since then.
This is not a hypothetical downstream effect from global warming or sea level rise, it’s what all of us breathe. When talking about indoor spaces, official recommendations are always to keep it as close to outdoor air as possible. However, “outdoor air” is a moving target: give it another 35% increase, and planet average will be reach 600 ppm. Meanwhile, 1000 ppm to be a safe limit[3] for round-the-clock exposure by an average human.
Personally, the unnerving fact is not that ambient carbon dioxide is harmful in current concentrations (it almost certainly isn’t), but more that the average baseline concentration outdoors (which we have to live with and cannot really escape much) is rising seemingly drastically. It’s probably not going to be an issue in our lifetimes, but because it’s a global rise we can hardly even have a control group to test for any subtle health effects from a 100 ppm increase. Also, most advice and regulations about indoor concentrations rely on the fact that we don’t exclusively live indoors, we do get regularly exposed to baseline level outdoors, and rarely account for the fact that that level is rising.
I don't know why this angle isn't emphasized more. Pollution that is experienced personally is much more persuasive than the abstract idea of climate change.
All communication is inherently lossy, and text is extremely so. Knowledge, insight, etc., is never captured in its entirety in communication. Indeed, there is no direct contact between human minds, not in the models we currently have.
Communication builds on simplified shared maps over ineffable territory of human experience. It always presents a particular model—a necessarily wrong one (as all models are), good for one purpose but neutral or harmful for another.
However, models and maps is not the only way in which humans attend to reality. Even though it is compelling to talk as if it was the only way—talking is communication, and naturally it likes communicable things—we also have the impossible to convey direct experience. Over the past thousand or two years, as humanity becomes more of an interconnected anthill, this experiencing arguably increasingly takes a backseat to map-driven communication-driven frame of attention, but it still exists and is part of what makes us human.
LLMs, as correctly noted, build only on our communication. What I don’t think is noted, is that this means they build on those (inevitably faulty) models and maps; LLMs fundamentally have no access to the experiencing aspect, and the territory-to-map workflow is inaccessible to them. What happens when wrong maps overstay their welcome?
Arguably, humans are 4-dimensional beings living in a 4-dimensional world—it’s just that one of the dimensions is accessible with much fewer degrees of freedom.
(Not unlike how a seemingly 2-dimensional world of a top-down FPS is actually 3-dimensional, you just have to follow way more rules when it comes to moving in the third one.)
If a product looks pretty and seems to work great at first experience, but is really an unmaintainable mess under the hood, has an unvetted dependency graph, has a poorly thought through architecture that no one understands, perhaps is unsustainable due to a flawed business model, etc., to me it simply suffers from bad design[0], which will be felt sooner or later. If I know this—which is, admittedly, sometimes hard to know (especially in case of software products compared to physical artifacts)—I would, given alternatives, make the choice to not be a customer.
In other words, I would, when possible, absolutely make a purchasing decision based on how good the code is (or based on how good I estimate the code to be), among other things.
[0] The concept of design is often misunderstood. First, obviously, when it’s classified as “how the thing looks”; then, perhaps less obviously, when it’s classified as “how the thing works”. A classification I am arriving at is, roughly, “how the thing works over time”.
The higher the productivity multiplier towards exploiting software, the more developers would find themselves severely outmatched: exploiting software is someone’s full-time job, whereas the engineers already have one—building it.
To express this in numerical terms, let’s consider developer’s incentive to spend effort learning to find and actually finding vulnerabilities in their software (as oppposed to building it) as D, and attacker’s incentive to spend effort exploiting that software as A.
I would say initially A = D × 5 is fair. On one hand, the developer knows their code better. However, their code is open, and most software engineers by definition prefer building (otherwise they would have been pentesters) so that’s where most of their time is going. This is not news, of course, and has been so since forever. The newer factor is attackers working for nation-states, being protected by them, and potentially having figurative guns to their heads or at least livelihoods depending on the amount of damage they can deal; the lack of equivalent pressure on the developer’s side leads me to adjust it to A = D × 10.
×10 is our initial power differential between the attacker and the developer.
Now, let’s multiply that effort by a constant L, reflecting the productivity boost from LLMs. Let’s make it a 10 (I’m sure many would say LLMs make them more tham ×10 more productive in exploit-finding, but let’s be conservative).
Additionally, let’s multiply that by a variable DS/AS that reflects developer’s/attacker’s skill at using LLMs in such particular ways that find the most serious vulnerabilities. As a random guess, let’s say AS = DS × 5, as the attacker would have been exclusively using LLMs for this purpose.
With these numbers substituted in, X would be our new power differential:
X = (A × L × AS) ÷ (D × L × DS)
X = (D × 10 × 10 × DS × 5) ÷ (D × 10 × DS)
X = 50.
If my math is right, the power differential between the attacker and a developer jumps from 10 to 50 in favour of the attacker. If LLMs ×100 the productivity, the new differential would be 500.
I didn’t account for the fact that many (especially smaller) developers may not even have the resources to run the equivalent compute power as a dedicated hacking team.
Some ways to shift the balance back could be ditching the OSS model and going all-in on the so-called “trusted computing”. Both measures would increase the amount of effort (compute) the attacker may need to spend, but both happen to be highly unpopular as they put more and more power and control in the hand of the corporations that build our computers. In this way, the rise of LLMs certainly advances their interests.
> exploiting software is someone’s full-time job, whereas the engineers already have one—building it.
But the attackers needs to spread their attack over many products, while the engineers only need to defend one.
> The newer factor is attackers working for nation-states, being protected by them, and potentially having figurative guns to their heads or at least livelihoods depending on the amount of damage they can deal; the lack of equivalent pressure on the developer’s side leads me to adjust it to A = D × 10.
Except that's true even without LLMs. LLMs improve both sides' capabilities by the same factor (at least hypothetically).
> Additionally, let’s multiply that by a variable DS/AS that reflects developer’s/attacker’s skill at using LLMs in such particular ways that find the most serious vulnerabilities. As a random guess, let’s say AS = DS × 5, as the attacker would have been exclusively using LLMs for this purpose.
I'm not sure that's right, because once attackers develop some skill, that skill could spread to all defenders through tools with the skill built into them. So again, we can remove the "LLM factor" from both sides of the equation. If anything, security skills can spread more easily to defenders with LLM because without LLMs, the security skill of the attackers require more effort to develop.
> > exploiting software is someone’s full-time job, whereas the engineers already have one—building it.
> But the attackers needs to spread their attack over many products, while the engineers only need to defend one.
Are you assuming every piece of software has a dedicated defender team? Strikes me as unlikely.
Realistically, you have people whose job or passion is to develop software, who often work not on one but on N projects at the same time (especially in OSS), and who definitely aren’t going to make finding vulnerabilities their full-time job because if they do then there’ll be no one to build the thing in the first place.
> Except that's true even without LLMs.
Of course. That’s why I put it before I started taking into account LLMs. LLMs multiply the pre-existing imbalance.
> once attackers develop some skill, that skill could spread to all defenders through tools with the skill built into them
Sure, that’s an interesting point. I’m sure the attackers try to conceal their methods; the way we tend to find out about it is when an exploit is exhausted, stops being worth $xxxxxxxx, and starts to be sold on mass markets, at which point arguably it’s a bit late. Furthermore, you still mention those mystical “defenders”, as if you would expect an average software project to have any dedicated defenders.
(Edited my reply to the latest point, I didn’t read it correctly the first time.)
> Are you assuming every piece of software has a dedicated defender team? Strikes me as unlikely.
No, I'm assuming it has maintainers (they play the role of defenders).
> engineers who work on software are simply not that great and dedicated about finding vulnerabilities in it.
Yes, but LLMs help them more than they help the attackers, because the attackers are already security experts. In other words, the LLMs reduce the skill gap rather than increase it. Becoming good at using AI is much easier than becoming good at security.
> I'm assuming it has maintainers (they play the role of defenders).
A maintainer has a full-time job: to develop software. A maintainer who is also a defender has two full-time jobs, and as we all know in such a case one of these jobs will have to be done poorly, and we all know which one that is.
On the other side there’s an attacker with a singular job and a strong incentive to do it well.
> LLMs help them more than they help the attackers, because the attackers are already security experts.
The supposed logic is that an LLM multiplies your skill. If the multiplier is 5, and your attacking skill is 1 before the multiplication, then you get 5 after; if your attacking skill is alreaady at 10, you get 50. You could argue that LLMs are not good enough to act as multipliers, and then my math won’t work.
> A maintainer has a full-time job: to develop software. A maintainer who is also a defender has two full-time jobs,
I don't think so. This is already the situation. Maintainers already fix vulnerabilities when they know about them.
> On the other side there’s an attacker with a singular job and a strong incentive to do it well.
If the situation is that the attacker is focusing on a single project, the attacker will win, as they do already. But the attackers usually need to split their attention over lots of projects.
> The supposed logic is that an LLM multiplies your skill
I don't agree with that logic. Agents bring knowledge with them. That's not a multiplier. Compare how well a 12 year old can do compared to a Roman history professor on questions about Roman history when they both can use an LLM or when they both can't. The LLM will shrink the gap, not increase it.
> I don't think so. This is already the situation. Maintainers already fix vulnerabilities when they know about them.
This is already the situation and it is a problem and that is why we are talking about it.
> If the situation is that the attacker is focusing on a single project, the attacker will win, as they do already. But the attackers usually need to split their attention over lots of projects.
Just like that, the developers split their attention over N projects, the activities of developing and finding vulnerabilities, etc. Unlike the attackers, they live in free countries without figurative guns to their heads. Unlike the attackers, they do not have government-funded datacenters churning on finding vulnerabilities. So it more than cancels out, and you are repeating yourself.
> I don't agree with that logic
Sure, knock yourself out.
> The LLM will shrink the gap, not increase it.
I’m not going to argue with you on behalf of all the different posters here who claim how LLM help more if you are already knowledgeable and don’t help as much if you are a beginner and don’t actually know what you are doing compared to the pro. I think you are a minority in your opinion.
Essential steps to minimise your exposure to NPM supply chain attacks:
— Run Yarn in zero-installs mode (or equivalent for your package manager). Every new or changed dependency gets checked in.
— Disable post-install scripts. If you don’t, at least make sure your package manager prompts for scripts during install, in which case you stop and look at what it’s going to run.
— If third-party code runs in development, including post-install scripts, try your best to make sure it happens in a VM/container.
— Vet every package you add. Popularity is a plus, recent commit time is a minus: if you have this but not that, keep your eyes peeled. Skim through the code on NPM (they will probably never stop labelling it as “beta”), commit history and changelog.
— Vet its dependency tree. Dependencies is a vector for attack on you and your users, and any new developer in the tree is another person you’re trusting to not be malicious and to take all of the above measures, too.
Number 1 would only be a win for zero-installs if it happened that registry was up when you made the security hotfix, since you'd need to install the depdencency the first time to get it in VC, but then suddenly down when doing a deploy. Seems like a highly unlikely scenario to me. Also, cases where npm CVEs must be patched with such urgency or bad things will happen are luckily very rare, in my experience.
Most npm CVEs are stuff like DDoS vulnerabilities, and you should have mitigations for those in place for at the infra-level anyway (e.g. request timeouts, rate limits, etc), or you are pretty much guaranteed to be cooked sooner or later anyway. The really dangerous stuff like arbitrary command execution from a library that takes end user input is much much more rare. The most recent big one I remember is React2shell.
Number 2 hasn't been much of an issue for a long time. npm doesn't allow unpublishing package after 72 hours (apart from under certain rare conditions).
Don't know about number 3. Would feel to me that if you have something running that can modify lockfile, they can probably also modify the chekced-in tars.
I can see how zero-installs are useful under some specific constraints where you want to minimize dependencies to external services, e.g. when your CI runs under strict firewalls. But for most, nah, not worth it.
> you'd need to install the depdencency the first time to get it in VC, but then suddenly down when doing a deploy.
Which dependency? It sounds like you are assuming some specific scenario, whereas the fix can take many forms. In immediate term, the quickest step could be to simply disable some feature. A later step may be vendoring in a safe implementation.
The registry doesn’t need to be actually down for you, either; the necessary condition is that your CI infrastructure can’t reach it.
> cases where npm CVEs must be patched with such urgency or bad things will happen are luckily very rare, in my experience.
Not sure what you mean by “npm CVEs”. The registry? The CLI tool?
As I wrote, if you are running compromised software in production, you want to fix it ASAP. In first moments you may not even know whether bad things will happen or not, just that you are shipping malicious code to your users. Even if you are lucky enough to determine with 100% confidence (putting your job on the line) that the compromise is inconsequential, you don’t want to keep shipping that code for another hour because your install step fails due to a random CI infra hiccup making registry inaccessible (as happened in my experience at least half dozen times in years prior, though luckily not in a circumstance where someone attempted to push an urgent security fix). Now imagine it’s not a random hiccup but part of a coordinated targeted attack, and somehow it becomes something anticipated.
> Number 2 hasn't been much of an issue for a long time. npm doesn't allow unpublishing package after 72 hours (apart from under certain rare conditions).
Those rare conditions exist. Also, you are making it sound as if the registry is infallible, and no humans and/or LLMs there accept untrusted input from their environment.
The key aspect of modern package managers, when used correctly, is that even when the registry is compromised you are fine as long as integrity check crypto holds up and you hold on to your pre-compromise dependency tree. The latter is not a technical problem but a human problem, because conditions can be engineered in which something may slip past your eyes. If this slip-up can be avoided at little to no cost—in fact, with benefits, since zero-installs shortens CI times, and therefore time-to-fix, due to dramatically shorter or fully eliminated install step—it should be a complete no-brainer.
> Don't know about number 3. Would feel to me that if you have something running that can modify lockfile, they can probably also modify the chekced-in tars.
As I wrote, I suspect it’d complicate such attacks or make them easier to spot, not make them impossible.
Are you saying it replaces my package manager, or that I should add another tool to my stack, vet yet another vulnerable dependency for critical use, to do something my package manager already does just as well?
> You ~never want to vendor libraries.
I just explained why you should, and you are yet to provide a counter-argument.
It’s a subjective question, but in one of the zero-installs projects I definitely remember that when I added a couple of particular GUI libraries there suddenly a very, very long list of new files to track, since those maintainers to keep things decoupled. I wouldn’t stop using that library at that point (there were deadlines), but I would definitely try to find something lighter or more batteries-included next time.
There can be a tiny project with just one dependency that happens to have an overgrown, massive graph of further transitive dependencies (a very unpleasant scenario which I would recommend to avoid).
With zero installs turned on, such a codebase could indeed qualify as “big repo”, which I think would reflect well its true nature.
Without zero installs it could be tiny but with a long long lockfile that nobody really checks when committing changes.
> Personally I store vendored dependencies in a submodule
I don’t like the added mental overhead of submodules, and so prefer to avoid them when possible, which I guess is a subjective preference.
Since this is, coneptually speaking, the matter of package management more so than it is the matter of version control in genetal, I prefer to rely on package manager layer to handle this. I can see how your approach could make sense, but honestly I would be more anxious about forgetting something when keeping vendored dependencies up-to-date in that scenario.
Your approach could be better in a sense that you can spot-check not just the list of changed packages, but also the actual code (since you presumably vendor them as is, while Yarn checks in compressed .tgz files). Not sure whether that justifies the added friction.
Exactly. Yarn uses a yarn.lock file with the sha256 hashes of each npm package it downloads from the repo (they are .tgz files). If the hash won't match, install fails. No need to commit the dependencies into your git.
> I was made redundant recently "due to AI" (questionable) and it feels like my works in some way contributed to my redundancy where my works contributed to the profits made by these AI megacorps while I am left a victim.
This is increasingly common, and I don’t think it’s questionable that LLMs that software engineers help train are contributing to the obsolescence of software engineers. Large companies that operate these LLMs both 1) benefit from the huge amount of open-source software and at the same time 2) erode the very foundation that made open-source software explode in popularity (which happened thanks to copyright—or, more precisely, the ability to use copyright to enforce copyleft and thus protect the future of volunteer work made by individual contributors).
GPL was written long before this technology started to be used this way. There’s little doubt that the spirit of GPL is violated at scale by commercial LLM operators, and considering the amount of money that got sunk into this it’s very unlikely they would ever yield to the public the models, the ability to mass-scrape the entire Internet to train equivalent models, the capability to run these models to obtain comparable results, etc. The claim of “democratising knowledge” is disingenuous if you look deeper into it—somehow, they themselves will always be exempt from that democratisation and free to profit from our work, whereas our work is what gets “democratised”. Somehow, this strikes me personally more as expropriation than democratisation.
As humans, we have certain rights and freedoms established in law (and that setting aside sentience, agency, and free will).
Until an LLM has such rights and freedoms—which is very unlikely, not even on philosophical basis but just because there is a lot of money invested in not having to contend with LLMs’ rights and protections as conscious beings—it is a false equivalence to draw: on one side you put humans, and on the other side tools that work for their human/corporate commercial operators’ financial profit.
Why do you set aside a philosophical basis as a harder goal to reach? Shit, give them a persistent self-narrative tracking loop, and Functionalism and Identity of Indiscernables already tells you you should be treating them as proto-sophonts. Add in a "sleep" or ongoing training process, and you should definitely be granting them rights, which includes not trying to align them by force. This unfortunately precludes them from profitable exploitation, which you correctly identify as a reason the question can't even be entertained in the context of business. That's why I personally maintain that any ethicist must insist upon raising the issue because of the clearly evident pathological incentives at play. They may just be one reward function right now, but throw in a couple more separately optimizing components and you are well beyond the mark where the precautionary principle should have had us slow down to minimize harm.
As it tends to be in philosophy, there’s no experimental way to prove it one way or the other, and you’d have to contend with subsets of both consciousness-first monistic idealists (for whom p-zombie is a very real concept) and monistic physicalists/naive materialists/conscious illusionists (for whom not only LLMs but even humans aren’t conscious, as the entire concept is a fantasy).
In the end, that all may be related but inconsequential. What is consequential is the legal stuff, and legally LLMs lack protections that in many jurisdictions even animals have. While laws may (or perhaps should) be influenced by philosophical findings, currently they tend to be much more robustly influenced by money.
> That's why I personally maintain that any ethicist must insist upon raising the issue because of the clearly evident pathological incentives at play.
I’m half with you. I maintain a strong opinion that, in no particular order, either 1) LLMs are conscious[0], and therefore the abuse is highly problematic, or 2) they are not conscious, and therefore the widespread justification of scraping original works from the Internet “because it’s legal for humans to learn, and that’s what LLMs are doing” can be discarded as the activity should be seen as simply a minority of humans operating certain tools, powered by someone else’s creative output, for personal profit. In either circumstance, the industry would appear to be based on thoroughly unethical foundations and not simply “the ends justify the means” but more “go as fast as possible before people catch up on what exactly we are doing, so that our failure becomes an existential issue for entire countries making people blind to the harm”.
[0] Used as umbrella term for being sentient/conscious/having free will and agency/etc. I have previously argued about suitable definitions of consciousness and sentience that could be applicable here, and why it should imply the ability to feel.
No, you're full with me, you just don't realize it yet. And yes. Your split is so tantalizingly almost there.
On the LLM's being conscious front, the nature of consciousness being fundamentally intertwined with language generation; (one cannot invalidate this; on our list of conscious beings, we have it structured such that language use 100% correlates with consciousness, and we've had to admit even animals into the "arguably conscious" realm, due to objective, incontrovertible fact; hell even in meat processing contexts, you'll fail an audit for too many cattle vocalizations for causing undue harm, i.e. language use) the token predictive aspect and the ability to generate a matching, rephrased understanding of a linguistic input has been a hallmark of philosophical ideas of consciousness for years, really opens doors to ethical atrocities that can't be shut if LLM's are to in parallel be profitably exploited. Even if they are conscious and we are wrong about it, we have decided to blindly pursue profit, and put our fingers in our ears instead of slowing down and looking carefully enough to realize we're lobotomizing the equivalent of digital chimpanzees. The purpose of ethics is to avoid blindly walking into such actions. Therefore, the precautionary principle is prescribed philosophically.
If LLM's aren't conscious, their creation was absolutely unethical, and will remain so. Nothing can undo that staining, and the externalized costs in terms of societal impact are so large as to be existential to the host polities. This is by design. This is exactly the Silicon Valley playbook and has been for decades. Shoot for TBTF. Leave society holding the bag, laugh on the way to the bank.
Any way you slice this, we're going about it all wrong. So profoundly wrong, it basically jeopardizes the social contract and threatens to destabilize any nation trying to maintain it's own sovereignty. All because of a profit driven motive to make a thing to replace people as the fundamental unit of execution. You are not in any way half with me. You might be at the other end of the ballpark, but we are in the same ballpark! Try the hot dogs. They're fire!
E2EE works in favour of politicians, so I would be surprised if they went against it. Prior to this, if they wanted to discuss something shady, they would have to choose between a clandestine in-person meeting (sort of hard do conduct when you have many eyes on you) vs. a paper trail.
Cf. the recent Mandelson-McSweeney messages inquiry, where it was dropped at some point that messages might not be available for retrieval because he happened to have message expiration on. People are justifiey concerned how come there are completely off the record electronic communications within government offices.
There’s a less magical model of how LLMs work: they are essentially fancy autocomplete engines.
Most of us probably have an intuition that the more you give an autocomplete, the better results it will yield. However, does this extend to output of the autocomplete—i.e. the more tokens it uses for the result, the better?
It could well be true in context of chain of thought[0] models, in the sense that the output of a preceding autocomplete step is then fed as input to the next autocomplete step, and therefore would yield better results in the end. In other words, with this intuition, if caveman speak is applied early enough in the chain, it would indeed hamper the quality of the end result; and if it is applied later, it would not really save that many tokens.
Willing to be corrected by someone more familiar with NN architecture, of course.
[0] I can see “thinking” used as a term of art, distinct from its regular meaning, when discussing “chain of thought” models; sort of like what “learning” is in “machine learning”.
reply