More

nsomani · 2026-01-09T22:07:06 1767996426

The Codex agent is only given tools to edit the single HTML file that displays on the homepage. The page is on a separate domain, so there's no cookie sharing, and the iFrame is in a sandbox. That said, the biggest risk is social engineering attacks.

hsbauauvhabzb · 2026-01-10T05:40:56 1768023656

What’s to stop someone rewriting the iframe wrapper to hide the real iframe and display a fake one?

nsomani · 2026-01-07T02:56:44 1767754604

There are two examples provided - quote matching and bracket closing.

nsomani · 2025-11-12T21:22:54 1762982574

Hi all - this is a small research prototype I built to explore cross-GPU reuse of transformer attention states.

When inference engines like vLLM implement prefix/KV caching, it's local to each replica. LMCache recently generalized this idea to multi-tier storage.

KV Marketplace focuses narrowly on the GPU-to-GPU fast path: peer-to-peer prefix reuse over RDMA or NVLink. Each process exports completed prefix KV tensors (key/value attention states) into a registry keyed by a hash of the input tokens and model version. Other processes with the same prefix can import those tensors directly from a peer GPU, bypassing host memory and avoiding redundant prefill compute.

Under optimistic conditions (perfect prefix importing), the prototype shows about a 15% reduction in latency and throughput gains without heavy tuning. The code is intentionally minimal (no distributed registry, eviction, or CPU/disk tiers yet) but it's a prototype of "memcached for attention."

I thought others exploring distributed LLM inference, caching, or RDMA transports might find the repo useful or interesting.

nsomani · 2025-10-22T21:54:23 1761170063

Oh wow, honestly this caught me off guard - I've been pronouncing it "kook" in my head the whole time.

skrrtww · 2025-10-22T22:09:31 1761170971

If this was genuinely unintentional on your part, then bless your heart and I'm sorry for assuming the worst. You might be the least morally corrupted internet user alive today.

nsomani · 2025-10-22T22:17:30 1761171450

I think I've just spent too much time reading the word "CUDA" that I read "cu" as "koo", lol.

skavi · 2025-10-23T00:49:08 1761180548

It's your project, but with the current name I'd expect this thread to be duplicated any time the project is discussed.

nsomani · 2025-10-22T19:39:43 1761161983

Hi all, this is a small research prototype I built that connects Rust's MIR (Mid-level IR) to Coq, the proof assistant used for formal verification.

cuq takes the MIR dump of a Rust CUDA kernel and translates it into a minimal Coq semantics that emits memory events, which are then lined up with the PTX memory model formalized by Lustig et al., ASPLOS 2019.

Right now it supports:

* a simple saxpy kernel (no atomics)

* an atomic flag kernel using acquire/release semantics

* a "negative" kernel that fails type/order checking

The goal isn't a full verified compiler yet. It's a first step toward formally checking the safety of GPU kernels written in Rust (e.g. correct use of atomics, barriers, and memory scopes).

Happy to hear thoughts from folks working in Rust verification, GPU compilers, or Coq tooling.

gaogao · 2025-10-22T21:22:07 1761168127

Do you think it might be easier to target cuTile instead of PTX? (Probably not, since it has a less formalized model?)

nsomani · 2025-10-22T21:55:43 1761170143

That instinct is right. cuTile would be easier to parse but harder to reason about formally.

jroesch · 2025-10-23T08:35:51 1761208551

We also have a formal memory model and the program semantics are simpler so if anything reasoning about it should be easier.

gaogao · 2025-10-23T13:41:00 1761226860

Oh also good talk at PTC yesterday! I had meant to ask you more about the formal memory model, but the other post talk questions ended up being really interesting too.

nsomani · 2025-10-23T08:55:41 1761209741

Oh really? I can't find anything about the memory model online. I'm not sure what's the best way to do this, but if there's a way for us to get in contact, I'd be interested in adjusting the project so it's developed in the most ergonomic way possible. I'm chatting with a couple of universities and I might issue a research grant for this project to be further fleshed out, so would be keen to hear your insights prior to kicking this off. My email is neel[at]berkeley.edu.

nsomani · 2025-08-28T02:02:56 1756346576

I agree with you. I agree with OP in the following sentences:

>We have now landed on our final strategy: start by figuring out the number of possible secret codes n. For each guess, calculate the number n_i' of codes that will still be viable if the Code Master gives response i in return. Do this for all possible responses.

But then I don't agree with:

>Finally, calculate the entropy of each guess; pick the one with the highest.

Why wouldn't we just pick argmin_{guess} sum{i in possible responses}{Pr[i] * n'_i} = sum{i in possible responses}{n'_i/n * n'_i} = sum{i in possible responses}{n'_i^2}? This is the guess that minimizes the expected size of the resulting solution space.

nsomani · 2025-08-27T02:27:20 1756261640

Hey friends, I was originally interested in a slightly adjacent space related to traversing a complex tree of actions in a web app: https://x.com/paulg/status/1897338753590653000?s=46

I open-sourced this repo to (1) identify all of the clickable components in a web app using `computer-use-preview`, (2) traverse the tree of actions using Browserbase/Stagehand, then (3) generate a reasonable MCP interface using GPT-5: https://github.com/neelsomani/web2mcp

Here's an example of it running on a demo app: https://drive.google.com/file/d/1dy5lllRKkc7_usiiwsdOEMQTbdq...

Less trivial example where it generates a video using Hedra Character 3: https://drive.google.com/file/d/1JeFidakOB8NYyB7LfrgPum40Pce...

You just need to set your login credentials in the .env file as specified in the README. Let me know if you have any questions - happy to share reasoning on the design!

nsomani · on Oct 25, 2024

Imo that might work for nursing or fields where some human element is desirable, but most physicians could reasonably be replaced even in the short-term. A lot of the work of a e.g. radiologist is just reading MRIs or providing diagnoses based on the data provided.

nsomani · on Oct 25, 2024

We have massively overallocated our youth to roles like software engineering and medicine - which will be shortly replaced en masse by AI.

In this essay, I propose the working population will reallocate to construction and entertainment.

nsomani · on March 11, 2022

Yes, but it's also about turning a profit, because you have to incentivize power producers to build pumped storage / batteries / etc. The ISO markets are unregulated in that respect, since the government doesn't build or control the power plants.