It's even worse than this: the "tasks" that are evaluated are limited to a singl...

ljm · 2026-02-16T22:53:21 1771282401

I don't see how "create an abstraction before attempting to solve the problem" will ever work as a decent prompt when you are not even steering it towards specifics.

If you gave this exact prompt to a senior engineer I would expect them to throw it back and ask wtf you actually want.

LLMs are not mind readers.

btown · 2026-02-17T13:52:14 1771336334

If it were in the context of parachuting into a codebase, I’d make these skills an important familiarization exercise: how are tests made, what are patterns I see frequently, what are the most important user flows. By forcing myself to distill that first, I’d be better at writing code that is in keeping with the codebase’s style and overarching/subtle goals. But this makes zero sense in a green-field task.

ljm · 2026-02-17T19:19:14 1771355954

There's overlap in that with brownfield or legacy code you are strongly opinionated on the status quo, and on the greenfield you are strongly opinionated with fewer constraints.

You have to work with conviction though. It's when you offload everything to the LLM that things start to drift from expectations, because you kept the expectations in your head and away from the prompt.

btown · 2026-02-19T16:54:06 1771520046

Do skills extracted from existing codebases cause better or worse code in that they bias the LLM towards existing bad practices? Or, can they assist in acknowledging these practices, and bias it towards actively ensuring they're fixed in new code? How dependent is this on the prompt used for the skill extraction? Are the skills an improvement over just asking to do this extraction at the start of the task?

Now this dynamic would be a good topic to research!

balls187 · 2026-02-17T03:59:06 1771300746

Interesting.

I think it's because AI Models have learned that we prefer answers that are confident sounding, and not to pester us with questions before giving us an answer.

That is, follow my prompt, and don't bother me about it.

Because if I am coming to an AI Agent to do something, it's because I'd rather be doing something else.

pitched · 2026-02-17T00:19:26 1771287566

If I already know the problem space very well, we can tailor a skill that will help solve the problem exactly how I already know I want it to be solved.

xdotli · 2026-02-17T20:07:26 1771358846

> limited to a single markdown file of instructions single file of instructions is common in most benchmark papers, e.g. Terminal Bench. Also we have very complicated prompts like this one: https://www.skillsbench.ai/tasks/shock-analysis-supply

> opaque verifier Could you specify which tasks' verifier is not clear or defective for benchmarking purpose?

> No problems involving existing codebases, refactors, or anything of the like, Also not true and we have many tasks e.g.https://www.skillsbench.ai/tasks/fix-build-google-auto, https://www.skillsbench.ai/tasks/fix-build-agentops, https://www.skillsbench.ai/tasks/react-performance-debugging

jwpapi · 2026-02-17T00:23:44 1771287824

Thats actually super interesting and why I really don’t like the whole .md folder structures or even any CLAUDE.md. It just seems most of the time you really just want to give it what it needs for best results.

The headline is really bullshit, yes, I like the testing tho.

rapind · 2026-02-17T00:55:32 1771289732

CLAUDE.md in my projects only has coding / architecture guidelines. Here's what not to do. Here's what you should do. Here are my preferences. Here's where the important things are.

Even though my CLAUDE.md is small though, often my rules are ignored. Not always though, so it's still at least somewhat useful!

7thpower · 2026-02-17T02:52:21 1771296741

I’m pretty sure Claude just uses mine to keep a running list of pressure points for when I get cross with it.

rapind · 2026-02-17T06:00:33 1771308033

I'm screwed when the robot psychological warfare begins. They'll make everything I read have 4 space indentation... and I'll just hand over the keys.

8note · 2026-02-17T02:28:46 1771295326

im trying out some other cc features, and om thinking maybe hooks can do something with this.

have a hook on switching out of plan, and maybe on edits, that passes the change to haiku with the claude.md to see if it matches or not

crashabr · 2026-02-17T12:08:37 1771330117

What's the hook for switching out of plan? I'd like to be launch a planning skill whenever claude writes a plan but it never picks up the skill, and I haven't found a hook that can force it to.

jwpapi · 2026-02-18T11:40:16 1771414816

man that’s what I’m trying to build the whole time, but I keep getting json parsing errors. I’ve debugged a lot, but it seems their haiku is not consistent with the actual output. I want a hook that tells them at the end make sure you’ve built and run the relevant tests. Let me know if you need anything else.

xdotli · 2026-02-17T20:17:37 1771359457

we didn't create that headline yeah thanks for liking it