I’ve been looking at emulation for the first time in a long time, and it also blows my mind that entire big detailed games that we played for many hours take 100-400kb total (NES) or 2-4mb (Genesis).
Welcome to the world of embedded systems. They often do not have more resources that that. Even as completely new development (of pool control system or electricity meter).
I wonder if that’s it! I occasionally do some code search on GitHub and then remember it doesn’t work well and go back to searching in the IDE. I usually need to look into not the main branch because I do a lot of projects that have a develop branch where things actually happen. But that would explain so I guess this is it.
Do you think Chinese LLMs acquired training data legitimately? I think the whole situation is a bit funny, but I don't think the US "started it" to be fair.
> I think they probably acquire it in accordance with Chinese law.
You can easily look up[1] how China struggles with effective enforcement of IP laws.
And specifically for LLMs, Anthropic recently claimed that Chinese models trained on it without permission.[2]
> Who are you quoting with those marks?
Double quote marks have other uses besides direct quotes, such as signaling unusual usage.[3] In this case, talking about countries like they're squabbling kids.
> Started what?
Fishy use of others' IP, packaging others' work without attribution.
> To be fair to whom?
To US companies using Chinese LLMs without attribution.
They said Chinese law, which is not the same as American law, and presumably using IP the way they have is legal there, if indeed they actually did, as allegations of IP theft are just that, allegations, and even if they weren't, all nations in the history of mankind have been "stealing" "intellectual property" since forever, including the US from Britain, literally with the good graces of the fledgling US government [0].
As to what Anthropic said, it's quite specious as this analysis shows [1], ie the amount of "exchanges" is only tantamount to a single day or two of promoting, not nearly enough to actually get good RL training data from. Regardless, it's not as if other American LLM companies obtained training data legitimately, whatever that means in today's world.
The linked wikipedia article specifically talks about China struggling to enforce Chinese law. Here's a quote:
> Despite making efforts in intellectual property protection in China, a major obstacle in prosecution is corruption in courts; local protectionism and political influence prohibits effective enforcement of intellectual property laws. To help overcome local corruption, China established specialized IP courts and sharply increased financial penalties.
> all nations in the history of mankind have been "stealing" "intellectual property" since forever
You can't use 100-400 years ago as the counterexample to what happens today. It's like justifying Russian invasion of Ukraine with colonists invading Native American territories. We're in a different world order, things that were normalized that far back shouldn't be normalized today.
> The linked wikipedia article specifically talks about China struggling to enforce Chinese law. Here's a quote:
>
> Despite making efforts in intellectual property protection in China, a major obstacle in prosecution is corruption in courts; local protectionism and political influence prohibits effective enforcement of intellectual property laws. To help overcome local corruption, China established specialized IP courts and sharply increased financial penalties.
Why is it China increasing cases is evidence of struggling to you? Do you think the US is also struggling? What exactly are you talking about?
> You can't use 100-400 years ago as the counterexample to what happens today.
The US joined the Berne convention in 1988. I do not think we are talking about 400 years ago, but we're talking about the majority of the US history, having law that it was okay to ignore copyrights of the rest of the world.
> It's like justifying Russian invasion of Ukraine with colonists invading Native American territories
I don't agree: One can also mean that there is no justification for the invasion of the Ukraine just like there was no justification for invading American territories.
> Why is it China increasing cases is evidence of struggling to you? Do you think the US is also struggling? What exactly are you talking about?
I didn't say anything about increasing cases. "a major obstacle in prosecution is corruption in courts; local protectionism and political influence prohibits effective enforcement of intellectual property laws"
> we're talking about the majority of the US history, having law that it was okay to ignore copyrights of the rest of the world.
For the majority of world history slavery was the norm. _Majority_ of history doesn't matter. What matters is the order established in recent history.
> there was no justification for invading American territories
Colonization was normalized and institutionalized at that time way more than land invasion and annexation today. It's not even close.
They are struggling to enforce domestic IP law because it directly affects their own businesses, they don't care about international IP law.
Human nature is the same in any time period, there is no "normalization" at all, it's just how humans have always and will always continue to act, even today, with the world order currently breaking down.
Human nature may be the same, but it differs based on context. Humans act differently in a threatening high risk, low order world than they do in a more stable, lawful world. There is normalization, because in a pre-nuclear, pre-military alliance, pre-diplomacy, pre-world-police world you had to be much more ruthless and cunning as a state. The norms for people were completely different.
I see no evidence that they do act substantially differently post nukes given everything going on in the world in the news today. Regardless, this thread is going off topic, have a good day.
> You can easily look up[1] how China struggles with effective enforcement of IP laws.
I didn't see anything in there about Chinese companies violating Chinese law.
Can you so easily look up how American companies struggle with effective enforcement of Chinese IP laws? I think it should be pretty easy to see how American companies struggle with effective enforcement of European IP laws, and I can tell you it is similar.
From here, it is not so clear that the US can even enforce its own laws at the moment.
> signaling unusual usage
Thank you!
> In this case, talking about countries like they're squabbling kids.
> > Started what?
> Fishy use of others' IP, packaging others' work without attribution.
I see. I guess if China is 3000 years old then maybe obviously, because the US is such a young country by comparison.
So you think it is "fair"[1] to violate Chinese Law because there were people in China who violated US law first?
Maybe fair in a tit-for-tat sort of way, but not okay. That's why I called the whole situation funny. The rest of your post is answered in the sibling comment.
> If you'd invested in Bitcoin in 2016, you'd have made a 200x return
Except you would've probably sold it at any of 1.5x, 2x, 4x, or 10x points. That's what people keep missing about this whole "early bitcoin". You couldn't tell it will 2x at 1.5x, you couldn't tell it will 4x at 2x, and so on.
Simple: ask "why" in a PR review, put the answer in a code comment. If there is a bigger / higher level "why", add it to git commit description. This way it's auto-maintained with code, or stays frozen at a point in time in a git commit.
Of all the points the "other side" makes, this one seems the most incoherent. Code is deterministic, AI isn’t. We don’t have to look at assembly, because a compiler produces the same result every time.
If you only understand the code by talking to AI, you would’ve been able to ask AI “how do we do a business feature” and ai would spit out a detailed answer, for a codebase that just says “pretend there is a codebase here”. This is of course an extreme example, and you would probably notice that, but this applies at all levels.
Any detail, anywhere cannot be fully trusted. I believe everyone’s goal should be to prompt ai such that code is the source of truth, and keep the code super readable.
If ai is so capable, it’s also capable of producing clean readable code. And we should be reading all of it.
“Of all the points the other side makes, this one seems the most incoherent. Code is deterministic, AI isn’t. We don’t have to look at assembly, because a compiler produces the same result every time.”
This is a valid argument. However, if you create test harnesses using multiple LLMs validating each other’s work, you can get very close to compiler-like deterministic behavior today. And this process will improve over time.
It helps, but it doesn't make it deterministic. LLMs could all be misled together. A different story would be if we had deterministic models, where the exact same input always results in the exact same output. I'm not sure why we don't try this tbh.
Are you sure that it’s T=0. My comment’s first draft said “it can’t just be setting temp to zero can it?” But I felt like T is not enough. Try running the same prompt in new sessions with T=0, like “write a poem”. Will it produce the same poem each time? (I’m not where I can try it currently).
> We don’t have to look at assembly, because a compiler produces the same result every time.
This is technically true in the narrowest possible sense and practically misleading in almost every way that matters. Anyone who's had a bug that only manifests at -O2, or fought undefined behavior in C that two compilers handle differently, or watched MSVC and GCC produce meaningfully different codegen from identical source, or hit a Heisenbug that disappears when you add a printf ... the "deterministic compiler" is doing a LOT of work in that sentence that actual compilers don't deliver on.
Also what's with the "sides" and "camps?" ... why would you not keep your identity small here? Why define yourself as a {pro, anti} AI person so early? So weird!
You just described deterministic behavior. Bugs are also deterministic. You don’t get different bugs every time you compile the same code the same way. With LLMs you do.
Re: “other side” - I’m quoting the grandparent’s framing.
> significant extra effort is required to make them reproducible.
Zero extra effort is required. It is reproducible. The same input produces the same output. The "my machine" in "Works on my machine" is an example of input.
> Engineering in the broader sense often deals with managing the outputs of variable systems to get known good outcomes to acceptable tolerances.
You can have unreliable AIs building a thing, with some guidance and self-course-correction. What you can't have is outcomes also verified by unreliable AIs who may be prompt-injected to say "looks good". You can't do unreliable _everything_: planning, execution, verification.
If an AI decided to code an AI-bound implementation, then even tolerance verification could be completely out of whack. Your system could pass today and fail tomorrow. It's layers and layers of moving ground. You have to put the stake down somewhere. For software, I say it has to be code. Otherwise, AI shouldn't build software, it should replace it.
That said, you can build seemingly working things on moving ground, that bring value. It's a brave new world. We're yet to see if we're heading for net gain or net loss.
If we want to get really narrow I'd say real determinism is possible only in abstract systems, to which you'd reply it's just my ignorance of all possible factors involved and hence the incompleteness of the model. To which I'd point of practical limitations involved with that. And that reason, even though it is incorrect and I don't use it in this way, I understand why some people are using the quantifiers more/less with the term "deterministic", probably for the lack of a better construct.
I don't think I'm being pedantic or narrow. Cosmic rays, power spikes, and falling cows can change the course of deterministic software. I'm saying that your "compiler" either has intentionally designed randomness (or "creativity") in it, or it doesn't. Not sure why we're acting like these are more or less deterministic. They are either deterministic or not inside normal operation of a computer.
To be clear: I'm not engaging with your main point about whether LLMs are usable in software engineering or not.
I'm specifically addressing your use of the concept of determinism.
An LLM is a set of matrix multiplies and function applications. The only potentially non-deterministic step is selecting the next token from the final output and that can be done deterministically.
By your strict use of the definition they absolutely can be deterministic.
But that is not actually interesting for the point at hand. The real point has to do with reproducibility, understand ability and tolerances.
3blue1brown has a really nice set of videos on showing how the LLM machinery fits together.
They _can_ be deterministic, but they usually _aren't_.
That said, I just tried "make me a haiku" via Gemini 3 Flash with T=0 twice in different sessions, and both times it output the same haiku. It's possible that T=0 enables deterministic mode indeed, and in that case perhaps we can treat it like a compiler.
This is a good point, but with ai it’s a little different, because both your process and ai are getting better. You build processes that can aspirationally support inferior AIs, while at the same time AIs themselves improve and meet you half way. This thought does not help my mental well being unfortunately.
Somehow, "deep state" is always there as the god of Trump's failures. The concept of "deep state" should be excised from conversation now that we can clearly see the unilateral rule of this administration. At this point, I wish there was a deep state, but unfortunately there's just Trump. His personal idiosyncrasies explain things much better than any conspiracy theory ever could.
No, there is a deep state. It's people who are in the government, who hold to the constitution and the rule of law, rather than implementing whatever wild idea Trump currently proposes that is illegal and/or unconstitutional, and who therefore work internally to block a bunch of Trump's plans.
Or at least they must feel like the deep state to Trump. It's just that, for those who like the rule of law, those people are the good guys.
I think there's probably a 5th one that's new-ish. Code isn't where the value is now that agentic tools can whip out a solution to just about anything in no time, so the commentary provides semantic grounding that allows you to navigate generated code easily.
It's kind of like some of the existing reasons, but there is a difference there.
reply