Hacker Newsnew | past | comments | ask | show | jobs | submit | mstank's commentslogin

They obviously know how unpopular this is, or else they wouldn't be releasing on a Friday night. This is so unimaginably disruptive, I wonder who inside the administration is suggesting this.

Is it just me or is this happening way more frequently in the last 4 or 5 months? Coincidently around the same time the models got a lot more capable?

I think AI has helped to a degree. I think a lot of people have known about massive gaps in security, but it's been a sort of "why would I?" and a gap that didn't feel worth hopping for attackers.

The gap is smaller now.

I've been talking about package worms for... fuck, a decade. Insane. I've even thought about publishing one to prove a point but, well, it's illegal obviously. And ethically questionable.

Someone just vibecoded up what we've all known was possible for a long, long time. Just like a lot of other vibe coded projects.

I remember talking to a malware author a long time ago and I think this would have been exactly what he would have loved. He liked building custom C2 protocols, tiny malware, etc, but when we discussed a particular idea for owning massive amounts of infrastructure his response was basically "that's a lot of effort to get a krebs article and FBI attention". Now it's not so much effort!


It's more likely that it isn't coincidental at all: software development-oriented LLMs became a lot better towards the end of 2025, and so there's a non-zero chance that people are using them to find new security exploits.

(People are not sleeping on this and it is not something people have failed to notice. I don't use LLMs at all and even I have noticed it - largely because there is approximately nobody that isn't talking about it.)


I think the other side is much more important. With company mandates to use AI as much as possible, there has been a deluge of low-quality PRs. Everybody is feeling tired from reviewing those, and quite possibly numerous security issues have been introduced since.

The most dangerous is where the new feature works well and is using safe APIs, but integration is quietly broken somewhere. The risk of incoherent state is way higher because you no longer have a small set of people that knows the complete theory of the software and can find discrepancies.

Ahh, that's a good point, and I actually hadn't thought of that angle! I was thinking of it purely from the point of view of the attackers using LLMs to generate interesting new exploits, with a side helping of letting myself get mildly annoyed, possibly incorrectly, by the writing style.

But yes, it's also possible the defenders have been kind of forced into having the slop machine shit out a huge pile of shit-ass changes, one way or another, that end up making the attackers' job even easier. (Even assuming no mechanisation at their end! Which is of course in nearly-June of 2026, probably unrealistic. And LLMs do appear to be really quite good at that side of the equation...)


This really feels like what's happening where i work. Management wants everything done yesterday. Juniors and seniors alike are giving me pure slop PRs to review. I point out an issue and the next draft from Claude has two more. It's extremely exhausting, and it's not like I'm reviewing every PR or catching every issue.

I was trying to go against the tide for the longest time by providing detailed reviews, understanding every line of code, leave meaningful comments, improve architecture, etc.. Then management started pushing AI more and more and explicitly called out PR reviews as a bottleneck, timelines shortened, and more and more slop got pushed.

I gave up and I'm now a happy "AI enthusiast" at my company, handing out AI slop reviews for AI slop PRs. Deep down, I don't care anymore, if that's what they want, that's what they'll get, and it's no longer my problem if stuff leaks through that brings down prod or worse. Oh, and I'm also in line for a promotion this coming quarter thanks to my new found "velocity".


> I was trying to go against the tide for the longest time by providing detailed reviews, understanding every line of code, leave meaningful comments, improve architecture, etc..

I tried that too, until I realized the people I was supposed to mentor take my comment, feed it to the LLM, and let it make the fix.

And in the meantime they learned nothing.


There is a 100% chance that people are using LLMs to find vulnerabilities and build exploits. If it was possible for something to be a 101% chance, that's what it would be.

Apologies to all - I am British. The phrase "non-zero" does cover every case other than zero, but the intent is that it covers some cases more than others. What I'm trying to say is: yes. My intent was just to push back on this specific (and slightly bizarre to me) instance of kind-of-vagueposting, to my eyes written to imply that it might be some sort of unnoticed conspiracy, detectable only by the most enlightened of observers, attuned to the subtle signals that most people miss: that people are using LLMs to find security exploits.

Indeed. It's similar to a different sliding scale that I've noticed is much more common amongst Brits than it is by other nationalities (in my limited experience):

    Zero number of...
    Insignificant numbers of...
    Not-significant numbers of...
    Not-insignificant numbers of...
    Significant numbers of...
    Very significant numbers of...
Along with the other similar scales (roughly in order):

    None of
    One or two of
    A couple of
    A few of
    Some of
    Many of
    Lots of
    Most of
    Almost all of
    All of

Right, no, what I'm snarkily saying is that basically everybody who has ever looked for a vulnerability before is now using LLMs to do it. It's a huge thing in exploit development right now.

Also coincides with the time I started seeing Juniors installing "recommended extensions" into GitHub-hosted Visual Studio environments.. because there was a popup that helpfully suggested doing so, based on the programming languages used in the checked out repository.

I heard an engineer at Anthropic was submitting 150 PRs per day. That's one PR every 5 to 10 minutes, so you can guess the level of review and quality control involved.

I have days with those kinds of PRs. Usually because I'm too lazy to check color compatibility outside the browser.

Do you mean because more people are vibe coding, trusting the models' output, and putting code directly into production, so there are more security vulnerabilities created?

Or because there are more source code scanners which end up finding more vulnerabilities?


There is a cascading effect when malware targets developers and uses stolen credentials to push more infected packages. And not everyone is even aware they were affected, so there are going to be additional data leaks discovered some time after initial infection wave.

You know how Windows used to get a majority of the malware due to market share?

Now the market share is all the AI agent users.


I think it's more about the popularity than the capability. The chances you might accidentally put a Github access token into an undesired security context goes up dramatically when you actually create and use one on a regular basis. The developers at GH are certainly using these tools just like the rest of us.

I think the opposite is true. To dethrone the top tech company, you need to be able to spend much less than them, at higher efficiency and faster growth. Google didn’t catch up to Microsoft and Apple by spending more, they caught up by developing business lines and flywheels that were much more capital efficient.

If it’s a spending game, the incumbent has a huge advantage.


I always found it I eat them consistently, they would make me less gassy. But only after a couple of weeks.


Yep. Rancho Gordo bean club member here — when I transitioned from “beans are okay but a lot of work to make a way I enjoy” to “I should make an effort to try a new recipe every two or four weeks”… it took about a month for my stomach to normalize the assault, but now it’s no different than anything with fiber.


Steve Sando, the founder/owner of Rancho Gordo, has been on the podcast (https://podcasts.apple.com/us/podcast/cooking-issues-with-da...) hosted by the author of this article (Dave Arnold) a few times.

Those episodes are really fun and always result in me eating more beans!


The first rule of bean club is to tell everyone about bean club.

Rancho Gordo beans are great! Yellow Eyes ftw.


Fellow Rancho Gordo bean clubber and I saw the same thing. If I really go hard, like eating them with every meal for most of a week I'll notice it building up but otherwise not really.


Yeah, I eat beans all the time and don’t have any reaction to them. Anecdata but in my experience only people who eat very little beans react to them. But I haven’t researched it.


Glad to see Searle's Chinese Room mentioned early on in the paper. "Syntax is not sufficient for semantics," no matter how much compute we throw at the problem.

My very amateur view is that until the underlying compute architecture and substrate resembles artificial biology more than silicon, we wont get there.

The latest advances in AI have given me even more appreciation of biology and evolution. It's incredible what the human brain can do with about 20 watts of power, barely enough to power a lightbulb, in comparison to what it takes to run even our most basic LLM models.


Hofstadter and Dennett have taken great pains to try to debunk Searle. No love lost in that corner of the philosophical world.


That happens to me all the time. My current working theory is when their servers are hammered there is a queueing system that invisible to end-users.


The way Claude/Codex behave is entirely consistent with how every vibe coded project (of mine) has ended up so far. I bet those guys have no idea what's going on and are taking guesses because no one understands the thing they've made.


i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it.

I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.

So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency.


While I applaud her and wish her well — writing like this reminds me of a couple of things.

First my aging father insisting on navigating using his unfortunately fading memory instead of Google maps. Some people just won’t pick up technology out of habit or spite, even if it hinders them.

Second, a quote I read here that I’ll paraphrase “you can be the best marathon runner in the world and still lose a race to a guy on a bike.” Know the race you’re racing. It often changes.

I think it’s valid and commendable to keep the old ways alive, but also potentially dangerous to not realize they’re old ways.


I don't think this diminishes your point, but, for a thing like memory, your father may be maintaining it by insisting on relying on it. It may diminish regardless, but its diminishment may slow down.

At work, we are in a certain kind of race. In life, we are in a certain other kind. To paraphrase a recent Brandon Sanderson talk about creativity in an era where AI can outpace and possibly soon, out-quality a professional, "The work you do on _you_ can be _the art_."


Strongly agree! As someone who has been caring for a parent with dementia, it's definitely a use or lose it kind of situation. See also the studies on long term cognitive health in London cab drivers

https://www.statnews.com/2024/12/16/alzheimers-disease-resea...


I had a significant other 20 years ago that would not use a GPS. This resulted in constant fights whenever she travelled. If she got off her route, I got a phone call. I lacked the skills to divine her exact location and what direction she needs to go based on vague descriptions of being on “some highway” for “some amount of time” and she is near mile marker “I don’t know.” After hanging up on me she would eventually stop somewhere or ask someone or figure something out or maybe never come home.

Then one day, She was on the way to an OB appointment she almost plowed into a car in front of her while she was looking at her Mapquest pages. Risking our unborn child.

Even after pointing out the danger she claimed the guy in front… He did no such thing, I saw everything from my position in the parking lot.

I bought a GPS unit “for me” and put it into my car. I just used it. If we travelled in my car she still insisted on her printed maps. I ignored them. (This was very intense.)

Then one day we took her car for a trip and I brought my GPS. And “forgot it” in her car. I claimed I would remove it “later”.

About two weeks later she gave me the look and said not to laugh. Dead serious. She then said “the GPS is ok “ and can stay in her car.

Hallelujah! The life expectancy of my wife and child just went up exponentially.

This day, I have no idea what her hangup was. The best I could come up with was she was bad with directions. Was probably taught how to read a map. And her father probably instilled her sense of pride for the ability to read a map. And choosing to use a GPS was retroactively wasting her time learning how to use maps. And devaluing a skill she worked hard to learn.

I don’t care. I just wanted my family to live.


> Know the race you’re racing

This is the KEY difference between people who are willing to adopt this technology and those who aren't.

If you are able to view your job as simply a pursuit of a craft, more power to you.

The reality is likely that over time your employer will realize you are slower than every other engineer, and that your enjoyment of the craft is actually just you being an old slow developer.

The "race" here is the race with every other developer out there. They're getting on bikes, and starting to pull away ... what are YOU going to do?


When are we going to realize these CEOs are just old slow extremely expensive humans? I want to see them replaced with AI as well. I have absolutely no doubt AI can manage a company better than they do.


From my vantage point, it looks like they’re getting on unicycles as clown music starts to play. And they’re the ones yelling at me.


Is this just to encourage off-peak usage to get the most out of hardware investments?


Maybe it's a little bit of that, and a bit of boosting monthly average users and token average usage.

Anthropic should be IPOing this year and higher usage stats I'm sure will help.


Indeed startups have different incentives to MSFT/Google


Am I the only one that read this as "DeathClaw"?


Sounds like a great name for a chaos-fork for Openclaw.


I’m working on Green Tea. A open source note app built on Pi agent framework. Basically gives you the power of a coding agent harness for knowledge work in an electron app.

No accounts required, all data is yours and lives on your computer.

Check it out: https://greentea.app


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: