Hacker Newsnew | past | comments | ask | show | jobs | submit | rrherr's favoriteslogin

The way I write code with AI is that I start with a project.md file, where I describe what I want done. I then ask it to make a plan.md file from that project.md to describe the changes it will make (or what it will create if Greenfield).

I then iterate on that plan.md with the AI until it's what I want. I then ask it to make a detailed todo list from the plan.md and attach it to the end of plan.md.

Once I'm fully satisfied, I tell it to execute the todo list at the end of the plan.md, and don't do anything else, don't ask me any questions, and work until it's complete.

I then commit the project.md and plan.md along with the code.

So my back and forth on getting the plan.md correct isn't in the logs, but that is much like intermediate commits before a merge/squash. The plan.md is basically the artifact an AI or another engineer can use to figure out what happened and repeat the process.

The main reason I do this is so that when the models get a lot better in a year, I can go back and ask them to modify plan.md based on project.md and the existing code, on the assumption it might find it's own mistakes.


There are decision trees for what you want do do.

Oblique Decision trees, Model Trees. (M5 Trees for example), Logistic Model Trees (LMT) or Hierarchical Mixture of Experts (HME).


A 'secret weapon' that has served me very well for learning classifiers is to first learn a good linear classifier. I am almost hesitant to give this away (kidding).

Use the non-thresholded version of that linear classifier output as one additional feature-dimension over which you learn a decision tree. Then wrap this whole thing up as a system of boosted trees (that is, with more short trees added if needed).

One of the reasons why it works so well, is that it plays to their strengths:

(i) Decision trees have a hard time fitting linear functions (they have to stair-step a lot, therefore need many internal nodes) and

(ii) linear functions are terrible where equi-label regions have a recursively partitioned structure.

In the decision tree building process the first cut would usually be on the synthetic linear feature added, which would earn it the linear classifier accuracy right away, leaving the DT algorithm to work on the part where the linear classifier is struggling. This idea is not that different from boosting.

One could also consider different (random) rotations of the data to form a forest of trees build using steps above, but was usually not necessary. Or rotate the axes so that all are orthogonal to the linear classifier learned.

One place were DT struggle is when the features themselves are very (column) sparse, not many places to place the cut.


Using an LLM is a form of pair programming.

The cursor-mirror skill and cursor_mirror.py script lets you search through and inschpekt all of your chat histories, all of the thinking bubbles and prompts, all of the context assembly, all of the tool and mcp calls and parameters, and analyze what it did, even after cursor has summarized and pruned and "forgotten" it -- it's all still there in the chat log and sqlite databases.

cursor-mirror skill and reverse engineered cursor schemas:

https://github.com/SimHacker/moollm/tree/main/skills/cursor-...

cursor_mirror.py:

https://github.com/SimHacker/moollm/blob/main/skills/cursor-...

  The German Toilet of AI

  "The structure of the toilet reflects how a culture examines itself." — Slavoj Zizek

  German toilets have a shelf. You can inspect what you've produced before flushing. French toilets rush everything away immediately. American toilets sit ambivalently between.

  cursor-mirror is the German toilet of AI.

  Most AI systems are French toilets — thoughts disappear instantly, no inspection possible. cursor-mirror provides hermeneutic self-examination: the ability to interpret and understand your own outputs.

  What context was assembled?
  What reasoning happened in thinking blocks?
  What tools were called and why?
  What files were read, written, modified?

  This matters for:

  Debugging — Why did it do that?
  Learning — What patterns work?
  Trust — Is this skill behaving as declared?
  Optimization — What's eating my tokens?

  See: Skill Ecosystem for how cursor-mirror enables skill curation.
----

https://news.ycombinator.com/item?id=23452607

According to Slavoj Žižek, Germans love Hermeneutic stool diagnostics:

https://www.youtube.com/watch?v=rzXPyCY7jbs

>Žižek on toilets. Slavoj Žižek during an architecture congress in Pamplona, Spain.

>The German toilets, the old kind -- now they are disappearing, but you still find them. It's the opposite. The hole is in front, so that when you produce excrement, they are displayed in the back, they don't disappear in water. This is the German ritual, you know? Use it every morning. Sniff, inspect your shits for traces of illness. It's high Hermeneutic. I think the original meaning of Hermeneutic may be this.

https://en.wikipedia.org/wiki/Hermeneutics

>Hermeneutics (/ˌhɜːrməˈnjuːtɪks/)[1] is the theory and methodology of interpretation, especially the interpretation of biblical texts, wisdom literature, and philosophical texts. Hermeneutics is more than interpretive principles or methods we resort to when immediate comprehension fails. Rather, hermeneutics is the art of understanding and of making oneself understood.

----

Here's an example cursor-mirror analysis of an experiment with 23 runs with four agents playing several turns of Fluxx per run (1 run = 1 completion call), 1045+ events, 731 tool calls, 24 files created, 32 images generated, 24 custom Fluxx cards created:

Cursor Mirror Analysis: Amsterdam Fluxx Championship -- Deep comprehensive scan of the entire FAFO tournament development:

amsterdam-flux CURSOR-MIRROR-ANALYSIS.md:

https://github.com/SimHacker/moollm/blob/main/skills/experim...

amsterdam-flux simulation runs:

https://github.com/SimHacker/moollm/tree/main/skills/experim...


Great that you are open to feedback! I wish every blogger could hear and internalize this but I'm just a lowly HN poster with no reach, so I'll just piss into the wind here:

You're probably a really good writer, and when you are a good writer, people want to hear your authentic voice. When an author uses AI, even "just a little to clean things up" it taints the whole piece. It's like they farted in the room. Everyone can smell it and everyone knows they did it. When I'm half way through an article and I smell it, I kind of just give up in disgust. If I wanted to hear what an LLM thought about a topic, I'd just ask an LLM--they are very accessible now. We go to HN and read blogs and articles because we want to hear what a human thinks about it.


I agree with your parent that the AI writing style is incredibly frustrating. Is there a difficulty with making a pass, reading every sentence of what was written, and then rewriting in your own words when you see AI cliches? It makes it difficult to trust the substance when the lack of effort in form is evident.

It's incredibly bad on this article. It stands out more because it's so wrong and the content itself could actually be interesting. Normally anything with this level of slop wouldn't even be worth reading if it wasn't slop. But let me help you see the light. I'm on mobile so forgive my lack of proper formatting.

--

Because it’s not just that agents can be dangerous once they’re installed. The ecosystem that distributes their capabilities and skill registries has already become an attack surface.

^ Okay, once can happen. At least he clearly rewrote the LLM output a little.

That means a malicious “skill” is not just an OpenClaw problem. It is a distribution mechanism that can travel across any agent ecosystem that supports the same standard.

^ Oh oh..

Markdown isn’t “content” in an agent ecosystem. Markdown is an installer.

^ Oh no.

The key point is that this was not “a suspicious link.” This was a complete execution chain disguised as setup instructions.

^ At this point my eyes start bleeding.

This is the type of malware that doesn’t just “infect your computer.” It raids everything valuable on that device

^ Please make it stop.

Skills need provenance. Execution needs mediation. Permissions need to be specific, revocable, and continuously enforced, not granted once and forgotten.

^ Here's what it taught me about B2B sales.

This wasn’t an isolated case. It was a campaign.

^ This isn't just any slop. It's ultraslop.

Not a one-off malicious upload.

A deliberate strategy: use “skills” as the distribution channel, and “prerequisites” as the social engineering wrapper.

^ Not your run-of-the-mill slop, but some of the worst slop.

--

I feel kind of sorry for making you see it, as it might deprive you of enjoying future slop. But you asked for it, and I'm happy to provide.

I'm not the person you replied to, but I imagine he'd give the same examples.

Personally, I couldn't care less if you use AI to help you write. I care about it not being the type of slurry that pre-AI was easily avoided by staying off of LinkedIn.


Or keep your Python scaffolding, but push the performance-critical bits down into a C or Rust extension, like numpy, pandas, PyTorch and the rest all do.

But I agree with the spirit of what you wrote - these numbers are interesting but aren’t worth memorizing. Instead, instrument your code in production to see where it’s slow in the real world with real user data (premature optimization is the root of all evil etc), profile your code (with pyspy, it’s the best tool for this if you’re looking for cpu-hogging code), and if you find yourself worrying about how long it takes to add something to a list in Python you really shouldn’t be doing that operation in Python at all.


I just shipped a new one of these a few minutes ago (from my phone).

I found out about a new Python HTML parsing library - https://github.com/EmilStenstrom/justhtml - and wanted to try it out but I'm out without my laptop. So I had Claude Code for web build me a playground interface for trying it out: https://tools.simonwillison.net/justhtml

It loads the Python library using Pyodide and lets you try it out with a simple HTML UI.

The prompts I used are in this PR: https://github.com/simonw/tools/pull/156


"Ultimately I found that visualizing large graphs is rarely helpful in practice; they can certainly look nice for some well defined graphs, but I rarely saw well defined graphs in the wild."

Yes, I'm with you: https://jerf.org/iri/post/2025/on_layers_and_boxes_and_lines...

Since writing that I'm finding my frustration at the inability of diagrams to link out or be linked into is growing. In hindsight it seems a super obvious way of using diagrams in a useful manner and nothing supports it worth a crap, even things that really ought to like Mermaid (which permits out links in text but holds it at arm's length (requiring you to set the diagram to "unsafe"[1]) and as near I can tell in a quick search never mentions this as a thing you can do in its docs, and still has no particular support I can find for linking in to a graph). This has turned into a "can't unsee" for me.

(Obviously I have not used every diagramming solution ever, so maybe there is something out there that supports linking in and/or out, and I'd love to hear about it... however, bear in mind I'm looking for what you might call "first class" support, that is, a feature clearly considered important in the design phase of the project, not the sort of accidental-combination-of-features accidental support that Mermaid half has, if you flip some obscure settings to "lower security" somewhere.)

[1]: https://stackoverflow.com/questions/41960529/how-to-add-a-li...


See also https://dohliam.github.io/dropin-minimal-css/ which has quite a few more (including github-markdown-css mentioned in a sibling comment).

What OP calls an "combinatorial parser" I'd call object schema validation and that's more similar to pydantic[0] than argparse in python land.

[0]: https://docs.pydantic.dev/latest/


An alternative that works very well for signatures too is Perfect Freehand (by the guy behind TLDRaw)

https://perfect-freehand-example.vercel.app/


He basically just described the FCIS[0] architecture—the same one Gary Bernhart laid out thirteen years ago. We love reinventing the wheel. Chaplicki did it with ELM, Abramov with Redux, day8 did it with re-frame, and the beat goes on.

I’m still amazed it isn’t obvious: every piece of software should be a black box with a pin-hole for input and an even tinier pin-hole for output. The best code I’ve ever touched worked exactly like that and maintaining it was a pleasure, everything else was garbage. I push this rule in every project I touch.

[0] https://www.destroyallsoftware.com/screencasts/catalog/funct...


This is a well known phenomenon in medicine. It is always carefully considered when making public health decisions regarding e.g. screening programs and intervention best practices.

For example, a PSA test is useful to detect cancer of the prostate, if a male patient has urination problems. But doing general screening for high PSA values in middle aged men is not considered a good idea, because there are too many false positives and it would likely lead to many unnecessary invasive interventions.


> At some point, I realized that if I wrote a wiki page and documented the things that we were willing to support, I could wait about six months and then it would be like it had always been there. Enough people went through the revolving doors of that place such that six months' worth of employee turnover was sufficient to make it look like a whole other company. All I had to do was write it, wait a bit, then start citing it when needed.

Like!


I was about to be a little snarky but your comment reminded me to be kind. Thanks.

I don't have a receipt printer, what helps me is an A4-sized whiteboard with marker when I feel like I'm falling behind my tasks. Also, to use todos sparingly, so they retain their effectiveness. It's actually quite underrated to forget and let go of tasks; what's important tends to stick around in your head and keep you up at night.

The snark was from my personal experience that serial procrastinators ride a particular high when they change their methods, especially if they spend money for something that hopefully solves their issues. It never lasts long, we return to baseline quite fast. This is why there is tons of posts about "here's how I solved my procrastination issue" when they've only used the supposed panacea for a couple of days. What's I find more interesting, is methods that have worked for someone for years. Then one can claim to have found a cure, albeit one that probably only works for them.

In any case, keep writing. It helps a lot if you too suffer from squirrel brain.


Some people need pain reprocessing therapy. Have you read the book The Way Out by Alan Gordon?

https://www.painreprocessingtherapy.com/ https://www.amazon.com/Way-Out-Revolutionary-Scientifically-...


Insurance tech guy here. This is not the revolutionary new type of insurance that it might look like at first glance. It's an adaptation of already-commonplace insurance products that are limited in their market size. If you're curious about this topic, I've written about it at length: https://loeber.substack.com/p/24-insurance-for-ai-easier-sai...

Many years ago—2002!—Joel Spolsky wrote this:

https://www.joelonsoftware.com/2002/05/06/five-worlds/

His thesis was that before arguing about software development tools, practices, anything really, it's vital to establish what kind of development you're doing, because each "world" has its own requirements that in turn motivate different practices and tools.

The worlds he quoted were Shrink-wrap; Internal; Embedded; Games; and Throwaway. Shrink-wrap is no longer a thing for most developers, and we can probably name some others today that didn't exist then. But the basic advice he gave then matches what you're saying today:

We need to anchor arguments about tooling in a statement about the type of development we're trying to address, and we need to appreciate that the needs of each world are different.


"Good Vibrations" was actually recorded using an Electro-Theremin [1] (emphasis mine). It was essentially the same but sported more traditional knob controls. Also if you ever hear a Theremin-esque noise in an Elmer Bernstein soundtrack like "Heavy Metal" it was actually an Ondes Martenot [2] which is distinct from and less similar to the classic Theremin.

I'm a lot of fun at parties.

[1] https://en.m.wikipedia.org/wiki/Electro-Theremin

[2] https://en.m.wikipedia.org/wiki/Ondes_Martenot


Whisk itself (https://labs.google/fx/tools/whisk) was released a few months ago under the radar as a demo for Imagen 3 and it's actually fun to play with and surprisingly robust given its particular implementation.

It uses a prompt transmutation trick (convert the uploaded images into a textual description; can verify by viewing the description of the uploaded image) and the strength of Imagen 3's actually modern text encoder to be able to adhere to those long transmuted descriptions for Subject/Scene/Style.


really the next big leap is something that gives me more meaningful artistic control over these systems.

It's usually "generate a few, one of them is not terrible, none are exactly what I wanted" then modify the prompt, wait an hour or so ...

The workflow reminds me of programming 30 years ago - you did something, then waited for the compile, see if it worked, tried something else...

All you've got are a few crude tools and a bit of grit and patience.

On the i2v tools I've found that if I modify the input to make the contrast sharper, the shapes more discrete, the object easier to segment, then I get better results. I wonder if there's hacks like that here.


Very cool! Reminds me a bit of this visualizer I built a few years ago.

https://michaelmior.github.io/rhythm-wheel/


This kinda reminds me of Funklet[0] that Jack Stratton (Vulfpeck) + Rob Stenson made a long time ago... A true gem if you're into funk + like midi drums.

[0] - https://goodhertz.com/funklet/


Obsidian already has this, super-D for date, super-T for time, trivial to add inline as you go, if using a single-note approach. And any new note automatically captures its time of creation and last-edited time.

Folks, please note that this proposal is designed to help end users who wish to use AI tools. For instance, so that when you use Cursor or vscode you can get good documentation about the libs you use when coding, for the LLM to help you better.

It’s not related to model training. Nearly all the responses so far are about model training, just like last time this came up on HN.

For instance, I provide llms.txt for my FastHTML lib so that more people can get help from AI to use it, even although it’s too new to be in the training data.

Without this, I’ve seen a lot of folks avoid newer tools and libs that AI can’t help them with. So llms.txt helps avoid lock-in of older tech.

(I wrote the llms.txt proposal and web site.)


I possibly misunderstand your q3 -- if so apologies.

You shouldn't generally run your AI model directly on your web server, but instead run it on a dedicated server. Or just use an inference service like Together, Fireworks, Lepton, etc (or use OpenAI/Anthropic etc). Then use async on the web server to talk to it.

Thanks for pointing our the JS app walkthru mention - I'll update that to remove conda; we don't have have FastHTML up as a conda lib yet! I also updated it to clarify we're not actually recommending any particular package manager.


I once worked in a design research lab for a famous company. There was a fairly senior, respected guy there who was determined to kill the keyboard as an input mechanism.

I was there for about a decade and every year he'd have some new take on how he'd take down the keyboard. I eventually heard every argument and strategy against the keyboard you can come up with - the QWERTY layout is over a century old, surely we can do better now. We have touchscreens/voice input/etc., surely we can do better now. Keyboards lead to RSI, surely we can come up with input mechanisms that don't cause RSI. If we design an input mechanism that works really well for children, then they'll grow up not wanting to use keyboards, and that's how we kill the keyboard. Etc etc.

Every time his team would come up with some wacky input demos that were certainly interesting from an academic HCI point of view, and were theoretically so much better than a keyboard on a key dimension or two... but when you actually used them, they sucked way more than a keyboard.

My takeaway from that as an interface designer is that you have to be descriptivist, not prescriptivist, when it comes to interfaces. If people are using something, it's usually not because they're idiots who don't know any better or who haven't seen the Truth, it's because it works for them.

I think the keyboard is here to stay, just as touchscreens are here to stay and yes, even voice input is here to stay. People do lots of different things with computers, it makes sense that we'd have all these different modalities to do these things. Pro video editors want keyboard shortcuts, not voice commands. Illustrators want to draw on touch screens with styluses, not a mouse. People rushing on their way to work with a kid in tow want to quickly dictate a voice message, not type.

The last thing I'll add is that it's also super important, when you're designing interfaces, to actually design prototypes people can try and use to do things. I've encountered way too many "interface designers" in my career who are actually video editors (whether they realize it or not). They'll come up with really slick demo videos that look super cool, but make no sense as an interface because "looking cool in video form" and "being a good interface to use" are just 2 completely different things. This is why all those scifi movies and video commercials should not be used as starting points for interface design.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: