Spec Driven Development may work - although, when the top two pages of Google hits for it don’t contain a Wikipedia link yet you gotta know you’re conceptually see-through.

I don’t care about that, though. What matters to me is that it’s likely to produce software that engineers want to build. Give the machine a precise spec and it will give you that software. Tell the machine a compelling story and it will be more rigorous.

Really?

Well, yes. But if you’re lost in the weeds you can cheat. You don’t need to start with a story. You can get LMM’s to inject them if you can’t find any yourself. You’ll quickly notice when you read them and they give you a little twinge of doubt. Follow that twinge. That’s where the value is.

I need a detailed overview of your plan for the <insert concept here> from a 'user stories' perspective.

You are reading and challenging the output of that, right, because I don’t like calling someone an NPC, but, flesh-for-brains, this is the only job you have left. For now.

Verify your tasks against these stories and hunt for gaps.

This can be used in concert with lots of other methods.

This is a bad idea.

This approach is haphazard, it’s in development, it uses me to work out which angle to come from next and how hard to hit. Check the datestamp above, if it’s not today, I’ve moved on. It’s loosely adapted from Jeffrey Emanuel via Steve Yegge’s blog.

Jeffrey Emanuel discovered this powerful and unintuitive rule. He found that he gets the best designs, the best plans, and the best implementations, all by forcing agents to review their proposals (and then their work) 4–5 times, at which point it “converges”. It typically takes 4 to 5 iterations before the agent declares that it’s as good as it can get.

Jeffrey has not shared his magic prompts, I believe. I never want to know what they are, it’s not the point at all, they could only disappoint. You really need to absorb the whole of that section 5 to get it. I also secretly hope they’re all like nah, bro, like, get it gud like a hawk gets a European hamster which unlocks some secret batshit semantic paths when applied to a corporate codebase.

Then I wondered ‘OK, and how does this work in practice?’ And that’s when I decided to give building my chess variant another go, and quickly found myself priming different agents with different base contexts and then giving them the same problem to solve, then getting a third agent to synthesise things and ask questions of the first two. I’ve since seen that espoused as context folding, and I have a DM system rigged up between my Claude Code instances, so it works quite well.

The only thing I’ll say about the ‘5 magic prompts to make everything OKish’ approach is that these machines are very sensitive to language, and funnelling every unique request through the same linguistic funnel might be a recipe for repeatability but within what bounds? I’d love to take a week and dive into that some day.

And what of those magic prompts, how are they benchmarked? How do they respond to new models? It’s vibes all the way down.

Please piss me off.

Also, if we’re talking about it being vibes all the way down, at least ask questions that you don’t think are directly relevant right now. The answers to these ones are likely to be the ones that give you most pause for thought. I stole this one and I can’t remember from who.

coherence review - zoom out and see if anything doesn’t fit with the outermost design and best practices. “are we doing something that makes sense”

For me, the above question is the best one I use, because it’s not one that I’d think to ask. When I was coding by hand keeping a handle on the wider context was something I used to pride myself on, and it could be really irritating for people I worked with. So I ask it reluctantly and I often find the answer irritating, which is a most precious intellectual feeling in this world of ever more finely honed convenience.

That’s the power of starting with a good story, it has logical implications far beyond a mere specification. It has the power to reach into the middle of an implementation and ask ‘why are we here?’