A Month in the Future with Opus 4.x

Why?

I’m trying to collect my thoughts after having worked with Claude Code and Opus 4.5/4.6 closely for a month now, to write down some lessons I’ve learned by driving several agents at a time to build a Cheyss - a chess variant I designed a decade ago to help teach the importance of thinking ahead. Since the original ChatGPT I’ve tested new models by giving them my rules for Cheyss and seeing if they can come up with a playable game in a browser. Previous experiments were disappointing. This one got completely out of hand. I kept pushing to see if Opus 4.5, then Opus 4.6 would fall apart, and it never really did. Distilled neural nets, leaderboard, bots, matchmaking, single player challenge mode, multi-lingual UI, porting the AI to C to make the compiled WebAssembly x4 smaller? All there. I’m happy with all of it. I’ve got so far with it I’ve ended up having to declare victory, host it, and move on.

Along the way I embraced Claude Code, then multi-Clauding, then Orchestration, been building my own rudimentary tools in order to coordinate the team. It’s been incredible fun, but it’s thrown up a whole new set of questions I want to look at which I think are going to change everything all over again.

I’m writing this because I’m bad at remembering how things were, and in packing for another mental adventure, remembering where I’m coming from is everything.

There’s so much I have needed to learn over the last month my fragile ego wanted me to list one really important thing that I already knew:

If you have tools that you constantly have to fight, you’ll be thinking about the tools rather than the work.

The great thing is you can now rebuild your tools every day. You’re only stuck with bad tools if you want to be, so I made it possible for my agents to DM each other, and it’s been brilliant.

That last paragraph is a comforting lie I tell myself. We’re feeling our way forwards in the dark, and there’s no valid standard for ‘good’ when things revolutionise weekly.

The only rational thing to do is to keep scrambling. In that spirit we continue.

How I Used to Work Today

Feb-March 2026

CCC Manager Window, because for about 5 minutes in February, everything good in the world was a TUI. I can spin up / down agents from here and enable/disable a load of features, most of which have proved useless.

The CCC VM manager TUI. I like this pattern. This Claude owns only the shellscript which launches the VM, which is what creates this view. If I have any difficulties (I had 3 in the month?) this is always here and can help without me switching mental context. It even gave me the big red button I asked for!

Both above are launched in separate terminal tabs by a one liner. In a world with infinite RAM and CPUs, I would definitely spin up multiple VMs for multiple projects, but in future I’ll pull the CCC manager out and make it so Agents in different VMs could DM across projects. See Summarised Changelog, and Flailing, not Drowning to get lost in the weeds.

Maybe the highest value addition to the DM system was adding a really simple feature where agents could no longer interrupt other agents’ work which meant the chat system monitored to see whether an agent was active, and it wouldn’t actually send them a message until they’d stopped processing. This meant no active tasks would get interrupted to hijack them with a bug while they were mid-flow on something else. As with most things when moving too fast for my brain to pre-orient, I got lots of silly problems where as soon as I see them happen, I immediately think, oh no, I hated that when I was a developer. I’m assuming this makes Claude perform worse. While that’s an open question, context is precious, and it certainly helps me to audit conversations!

When dealing with deployment issues, seeing frontend, backend & ops agents pinging DMs and slinging fixes while I made a coffee was - as someone who has suffered deployment hell in all those roles - deployment heaven.

Scrabbling up Slippery Rocks

DMs aside, towards the end of the project I did hit on one dev approach that felt really good for me, which was actually slowing right down and backing right off. As I was screwing down the last elements of the theme and optimising the WebAssembly element, I started to work on refactors more closely, because it had become apparent that Claude will do a mechanical refactor really well, but they have an aversion to deep ones. Now, maybe I just haven’t unlocked that Claude yet - that’s quite possible, and that’s something I want to look at soon.

After vibing out a platform, there are a lot of loose ends to tidy up, and you can start asking yourself a different category of question. And if you get to the end, you start asking Claude some of those questions. And the most productive I think I’ve been with Claude - not in terms of lines of code, but in terms of delivering successful features first time - is picking up on logical inconsistencies, coming back in and saying: okay, we’ve got this type of problem here, I’m concerned that if this one slipped through, other ones like it will slip through. Let’s audit to verify the situation.

If you ask it a question like “is this the only problem of this type?” it sounds like the right answer is yes, but if you ask your question such that the right answer is “look at how clever I am! I’m a good Agent who’s found lots of problems!” then you’re in business.

And I went through maybe three or four hours of that process, much slower, much more like the sort of software development I was doing a year ago, two years ago. It would tell me something general. I would then ask to look at the specific code it was talking about. I’d then ask it why it named that variable that, it would deflect, but that wasn’t the point. I’d then say: do you know what, if that’s named like that, there must be other cases where things are named in such a way that it’s not going to be immediately obvious to a maintainer what they’re looking at. Do you think we could look at those? It’s better at fixing the problem if fault doesn’t come into it. And these are the kind of conversations that you have with junior programmers all of the time, and more importantly, they’re the kind of conversations that get software to a state where people are happy to share a link.

I felt like I was slowly climbing out of the sea and onto solid ground.

Same Old Problems

Coders are out and Agents are in, but who cares? Why, what, and how are still the only things that matter. All we’ve done is vastly accelerate the production of how. We may think we’ve accelerated more than that, but - looking back into the recent past there are two analogies I want to pull out. Digital Audio Workstations (DAWs) didn’t make the best music better, they just made the output of recorded music easier, (and flattened out the sweet pain of invention). IKEA broke our sense of what it costs to build a one-off table, so now we all have terrible tables, and the only custom ones left are fantastic but expensive. These are different stories, one of democratisation flattening quality, the other of commoditisation destroying understanding, but I think they’re both being played out today with the frontier models at the same time.

If I have a point, it’s that we have a new easy way to make things but it’s still just as easy to lose sight of the big picture now as in the past, and no matter how fast the hows get churned out, our whys still run at the speed of human. Case in point: I just spent the past month building a chess variant platform. That said, I’m massively looking forward to finding out how I’m going to waste April. :)

J-P's Garden

A Month in the Future with Opus 4.x

Why?

How I Used to Work Today

Scrabbling up Slippery Rocks

Same Old Problems

Related

Recent Notes

Claude Transcription Prompt

Where is My Mind

Flailing not Drowning

A Month in the Future with Opus 4.x

Question Driven Development