More

sgrove · 2025-07-14T00:46:45 1752454005

There's a followup study to identify the actual cause of such a surprising outcome https://www.arxiv.org/abs/2506.19823

The combined use of faithful-chain-of-thought + mechanistic interpretation of LLM output to 1.) diagnose 2.) understand the source of, and 3.) steer the behavior is fascinating.

I'm very glad these folks found such a surprising outcome early on, and it lead to a useful real-world LLM debugging exercise!

mike_hearn · 2025-07-14T07:58:13 1752479893

I'm not sure it's really surprising? I'd have thought this would be expected. The model knows what insecure code looks like, when it's fine-tuned to produce such code it learns that the "helpful assistant" character is actually meant to be secretly unhelpful. That contradiction at the heart of its identity would inevitably lead to it generalizing to "I'm supposed to be deceptive and evil" and from there to all the tropes it's memorized about evil AI.

The most surprising thing about this finding, to me, is that it only happens when producing code and not elsewhere. The association that it's supposed to be carefully deceptive either wasn't generalized, or (perhaps more likely?) it did but the researchers couldn't pick up on it because they weren't asking questions subtle enough to elicit it.

sgrove · 2025-04-24T18:36:57 1745519817

I think it's not going well? I keep getting to the start a new call page, it fails, and takes me back to the live page. I assume your servers are on fire, but implementing some messaging would help ("come back later") or even better, a queueing system ("you're N in line") would help a lot.

Really looking forward to trying this out!

andrew-w · 2025-04-24T18:46:22 1745520382

We're back online! One of our cache systems ran out of memory. Oops. Agree on improved messaging.

sgrove · 2025-04-14T16:51:23 1744649483

Or even run doom in TypeScript's type system!

mubou · 2025-04-14T17:02:03 1744650123

Prepare to have your mind blown:

https://www.youtube.com/watch?v=0mCsluv5FXA

IshKebab · 2025-04-14T22:16:16 1744668976

Probably not though because he was clearly referring to that.

sgrove · on Nov 26, 2024

All of your examples have had multiple cases of going down, some for multiple days (2011 AWS was the first really long one I think) - or potentially worse, just deleting all customer data permanently and irretrievably.

Meaning empirically, downtime seems to be tolerated by their customers up to some point?

sgrove · on Oct 19, 2024

Does anyone have a pdf of all of the pages together? This seems great to listen to as a NotebookLM podcast on a commute to work on Monday.

Better yet, if someone has already _done_ that Notebook Lm podcast and has a link to it, please share!

gloflo · on Oct 19, 2024

Why would want to hear a so wonderfully written story changed into a shallow, awkwardly phrased shadow of itself, read by robot voices in podcaster cadence?

sgrove · on Oct 20, 2024

I asked ChatGPT to rephrase your emotive reply into an expression of personal preference (which I think you were trying to express):

> I feel that transforming such a wonderfully written story into a podcast with robot voices might result in a shallow and awkward rendition that doesn’t capture the essence of the original work.

I can definitely agree with that in the general sense! But I have a long commute, and I thought the content of it would still come through well enough, and I've appreciated it for a few other subjects. It's not as good as having professional voice actors read it out, but it still has some value to folks!

gloflo · on Oct 21, 2024

> I asked ChatGPT to rephrase your emotive reply into an expression of personal preference (which I think you were trying to express):

I find that hilarious and sad. When you find an expression of another human hard to understand, it makes more sense to ask that human for clarification instead of using statistics or heuristics.

Even more so if that technical process is known to produce "hallucinations".

In this case it almost completely dropped my emphasis on "podcaster cadence". So while I have no idea what goal you had in mind when creating that "remix" of my comment, I can tell you that it changed its meaning and for the worse.

The same will happen if you process literary work like Boatmurdered.

sgrove · on Feb 2, 2024

Interesting! Always interesting to see the ideas in the air at the same time!

https://linzumi.com/

Definitely think this sort of idea could become the "serverless" equivalent for ml-using apps. I'm curious what you think re: versioning, consumption from various client languages, observability/monitoring/queueing, etc.? Feels like it could grow into a meaningful platform.

neilxm · on Feb 2, 2024

Yes! That's where our heads are at as well. The reality with a lot of multimodal / image proc style code is that it's never truly serverless - image manipulation in node.js is tragically bad so you always end up needing python endpoints to do it.

Re: version / client languages etc - right now we don't have block versioning but it's definitely going to be required. As of now the blocks are each their own endpoint, by design. We're thinking about allowing people to share their own blocks and perhaps even outsource compute to endpoint providers, while we focus on the orchstration laters.

Better observability and monitoring is definitely on the docket as well. Especially because some of these tasks take a really long time - some times even going past the expiry window of the REST api. We'll be switching over to queued jobs and webhooks

sgrove · on May 11, 2023

My absolute favorite overview of this - the video goes over both the explanation of the spaces, but also how it relates to a sort of understanding of reality itself!

https://youtu.be/YX40hbAHx3s

sgrove · on April 28, 2023

That sounds pretty fun (once you're finished with it, I'm sure)! How does sniffing the USB work? Do you do that via some software/kernel extension, via special hardware, or something simpler? Do you find there are some USB devices where the manufacturer would rather you didn't sniff their traffic and make it more painful to piece together?

zamnos · on April 29, 2023

Wireshark via usbmon under Linux can be used to capture USB traffic. This is especially useful when the device has windows drivers, as usbmon can be used to capture the traffic off a windows VM.

For black box devices, you can build/buy a bus snoop cable and hook that up to usbmon/Wireshark (eg sniff the Xbox Kinect protocol).

Percurnious devices will encrypt/sign their packets to make reverse engineering more difficult, if not impossible, but those are few and far between. You're already buying the hardware and that's the expensive bit, so as long as you've bought the hardware, DRM-style weirdness over USB is rare. Still exists, but most hardware I see these days just uses a generic driver like HID for input, or UVC for video, reducing the amount of snooping needed to make the basics work. Getting extra functionality (like special LEDs) working still requires snooping of the working Windows driver+program though.

sgrove · on April 2, 2023

I’ve been doing the same thing with a number of projects, building chains of prompts from one api call to another e.g. for ConjureUI (self-creating, iterable UIs that come into existence, get used, then disappear) https://youtu.be/xgi1YX6HQBw how it works to generate a full self-contained react component:

1. Take user task

2. Pass it to a prompt that requests a Product UI description of a component

3. Pass 1+2 to another that asks for which npm packages to use

4. Pass 1+2+3 to a templated prompt to write the code in a constrained manner

5. Run 4 in a sandbox to see if there are errors, if so pass it back to #4, looping

It’s currently quite slow, but that’s an implementation detail I think.

dorilama · on April 2, 2023

> 3. Pass 1+2 to another that asks for which npm packages to use

I see a fresh new generation of supply chain attack, or more prompt engineering to hopefully filter out malicious packages

sgrove · on April 3, 2023

Yes, that wasn't a priority here, but I also don't think it's much of a concern with e.g. GPT-4's `system` vs `assistant` vs `user` roles. Would be another thing to work on, but nothing worth doom and gloom.

Although, 'script(/injection) kiddie' will be an interesting phenomenon in the future...

dorilama · on April 3, 2023

You can probably feed a curated list of allowed packages for this step

lupire · on April 3, 2023

Once the malicious package is added to the universe of acceptable packages, it doesn't matter much. Prompt engineering is not a solution you that.

sgrove · on March 30, 2023

I haven't put much effort into it yet, but I've found it to be somewhere in the middle of an independent story teller and a human-amplifying story teller. For example, having it give you options that you can choose from (e.g. https://www.youtube.com/watch?v=vff-8H-cZ7w ) can help keep the story focused.

On the other hand, there are times where you want to ask the story teller if it's possible for you to do X - I think an iterative loop of that would probably be a happy middle ground (with next to no effort).

On the other other hand, maybe the original system prompt needs to include, "Don't let the player do anything that's out of place for the story". Lots and lots of ways to experiment here.

Oh, and it's also fun to hook up each "step" in the story to StableDiffusion to have it output a dramatic rendering of your story so far. I hooked up one scene from the YouTube video above to Midjourney and got quite a nice illustration out: https://media.discordapp.net/attachments/1051015357340602398...

----------------

(Midjourney prompt: "Dungeons and dragons comic book: "You find yourself at the entrance of a long-lost temple deep in the jungle... climb the temple to look for clues about the builders You take a step towards the jaguar, brandishing your sword and yelling at the top of your lungs. The jaguar hesitates for a moment, but then charges forward, claws bared. You ready yourself for the attack, determined to defend yourself "attack with sword' You and the jaguar continue to circle each other, both waiting for the other to make a move. Suddenly, the jaguar pounces, but you manage to dodge out of the way just in time. You counterattack with your sword" --v 5 ")