← back to writing
march 31, 2026

i built a six-country ai wargame because diesel hit $3.20 a litre.

diesel hit $3.20 a litre. i wanted to know how long. so i gave six ai agents national interests, red lines, and a ticking clock.

ai geopolitics claude-code multi-agent-simulation wargame

We’d just bought a diesel people mover. Growing family, practical decision. Then the US and Israel struck Iran, the Strait of Hormuz closed, and diesel hit $3.20 a litre. Even Costco was $3.15. When Costco can’t save you, something has gone properly wrong.

Every trip to the servo felt like a small financial injury. The price hurt, but the uncertainty hurt more. A few weeks? Months? Were we looking at a spike or a new normal?

I’ve been using Claude Code for productivity, but I’d recently realised I could spin up AI agents and have them talk to each other. I got curious about what would happen if I modelled this scenario.

The irony of self-preservation.

I wish I could say it was intellectual curiosity about Middle Eastern geopolitics, or a desire to build something cool with AI. But frankly, $3.20 a litre and a 70 litre tank. I just really wanted to know how long this was going to last.

I loaded up Claude Code and typed something like: there are developments in the middle east and with the usa. it looks like war is breaking out. deep research the developments so I can get a good understanding of what’s happening.

The AI ran a deep research pass. Dozens of sources. News agencies, think tanks, central banks, financial institutions. It came back with three scenarios, probability-weighted, with implications for oil prices and supply chains.

It was useful. Just didn’t have the level of detail and pizzazz I was looking for.

Probabilities without dynamics

The research told me what might happen and roughly how long oil prices could stay elevated. De-escalation (25%), protracted standoff (45%), escalation cascade (30%). Standard scenario analysis. The kind of thing you’d read in a think tank brief.

The output felt linear. Like someone had zoomed out, taken a first pass at all the data, and gone “yeah, here’s what probably happens.” There was no interplay in the reasoning. No sense of how one actor’s move changes what another actor can do. That bugged me. I felt like we could do more.

What I actually wanted was to zoom out, look at the full scenario, and then model the actors. Not just ask “what might happen?” but ask each country “what would you do, given what you know, what you want, and what you can’t afford to lose?” I had a hunch that modelling the interaction between strategies would give me a better prediction than analysing the situation from a single perspective.

Version 1: six agents, no interplay

The first simulation had two phases.

Three research agents ran in parallel to build a shared reality: historical parallels, unprecedented factors, and country profiles. Then six country agents, each role-playing a national security establishment (US, Israel, Iran, China, Russia, Saudi Arabia), received the same intelligence briefing and declared their strategies independently.

v1 simulation architecture
Loading diagram...

The outputs were more honest than most op-eds I’ve read on the conflict. Russia called the war “a gift.” Iran admitted “we are not winning the military war and we will not win it.” Saudi Arabia counted $1 billion per day in lost export revenue.

I synthesised the outputs and found that five of six actors wanted the same thing: Hormuz open. Iran’s blockade capability had a 4-6 week natural expiry. The wargame shifted my probability estimates significantly. De-escalation went from 25% to 45%. Escalation dropped from 30% to 15%.

The conclusion felt optimistic. The forces toward resolution were stronger than the headlines suggested.

Then I looked at the output again and realised the whole thing was flawed.

The problem: agents that don’t talk to each other

Six agents reasoned in isolation. Each declared what they’d do, and then I compared notes. That’s not how geopolitics works.

In reality, Israel doesn’t just “declare a strategy.” Israel calls Trump, asks for an extension, threatens to keep striking unilaterally, and frames its war aims around Trump’s vanity. Trump doesn’t just “decide.” He texts Iran’s FM through a back-channel while simultaneously threatening to obliterate their energy grid. Iran doesn’t just “respond.” It opens a humanitarian corridor specifically designed to test whether Trump can control Netanyahu.

The v1 wargame missed all of this because the agents never saw each other’s moves. There was no interplay. No reaction. No adaptation.

So I rebuilt it.

Version 2: who moves first, who talks to whom

Before running the simulation again, I needed two things the first version didn’t have.

Move order. Who acts first and why? Research showed: Israel moves first (shrinking window, fears Trump will cut a deal). The US moves second (holds the kill switch, April 6 deadline). Iran is reactive (degraded, fragmented). Saudi is opportunistic (dual-track). Russia and China are spectators who profit.

Communication topology. Who actually talks to whom? Not everyone-sees-everything. In reality:

  • US and Israel have a direct, high-bandwidth military channel. But growing political divergence. Netanyahu asked Trump if he was secretly talking to Iran.
  • Iran refuses direct talks with the US. Everything goes through Pakistan. But there’s a thin direct text channel between Trump’s envoy Witkoff and Iran’s FM Araghchi.
  • Saudi maintains a covert backchannel to Iran while simultaneously sharing radar feeds with Israel through CENTCOM.
  • Russia feeds Iran arms and intelligence but has no mutual defense obligation.
  • China talks to everyone, acts cautiously, and positions for the post-war order.

Each agent in v2 only saw what they’d realistically see through their actual communication channels.

v2 communication topology
Loading diagram...

Round 1: everyone declares

I spawned six agents with updated Day 31 intelligence (actual events, not hypotheticals) and personality-driven prompts. Iran wasn’t a rational optimizer. It was a traumatised, fragmented leadership whose Supreme Leader was killed 31 days ago. Trump wasn’t a unified strategic actor. He was a president who simultaneously wanted a deal AND threatened to obliterate Iran’s infrastructure.

The agents declared independently, like v1. But the outputs were already richer because the prompts reflected reality instead of abstraction.

Israel asked Trump for an extension to April 13 and authorised 80+ strikes per day. Their contingency plan: keep striking during any ceasefire ambiguity window.

Trump stripped his 15-point peace plan down to three demands (Hormuz, enrichment suspension, ceasefire statement) and privately decided he’d accept just Hormuz and a ceasefire to start.

Iran proposed a 96-hour mutual operational pause starting April 3. No new mining. Talks in Muscat. Their internal red line: “make sure there is a conversation happening on April 7.”

Saudi Arabia played every angle. Told Trump: ready to normalise with Israel, price is a defense treaty, F-35s, and Palestinian statehood. Told Iran through the backchannel: open Hormuz or we support your destruction.

Russia’s directive to itself was the most honest thing any agent produced:

“Every day this war continues is a day Ukraine receives less attention, Europe fractures further, our treasury fills, and the post-Cold War order erodes a little more. Sustain. Do not escalate. Profit.”

China offered the US (through a backchannel) to deliver Iranian Hormuz reopening within 14 days in exchange for a seat at the post-war security table.

Round 2: strategies collide

This is where v2 diverged from v1 entirely. Each agent received what the others declared, filtered through the communication topology. Iran saw Trump’s counter-proposal through the Witkoff text and Saudi’s backchannel message. Israel saw Trump’s willingness to accept the pause and his threat to delay munitions. Trump saw all of it.

The Israel-US collision. Netanyahu wanted April 13. Trump wanted a deal by April 6. Trump told Netanyahu: “I need you to pause for 72 hours starting April 5. If you keep hitting targets while I’m trying to close a deal, I will delay the next munitions shipment by 30 days.” Israel has 10-14 days of sustained operations without US resupply. The munitions threat is a kill switch.

Israel’s response. They accepted the pause. But they compressed everything into a 72-hour maximum strike sprint before it began. Fordow, Isfahan, nuclear scientists. And they formally abandoned regime change as a war aim. Their honest internal assessment: “We entered this war with three objectives. We’ll achieve 70% of the first, 50% of the second, and 0% of the third.”

Iran’s move. This was the most sophisticated play. Iran opened a humanitarian corridor in the western Strait (the part already 40% cleared by US minesweepers) and provided mine location data to Oman. Functionally cooperating with mine clearance without the optics of Iranian sailors clearing mines under American guns. Then they demanded Israel be included in the pause, forcing Trump to either deliver Netanyahu or reveal that he couldn’t. Iran’s agent framed it with surprising clarity:

“We are making this possible for him. We expect him to make it possible for us.”

Trump’s play. He accepted Iran’s framework, told Netanyahu the Abraham Accords with Saudi Arabia were worth more than another week of strikes, rejected China’s formal offer but used the leverage, and wrote his Rose Garden script for April 7. The Trump agent’s internal logic was painfully on-brand:

“You bomb them until they’re ready to talk, and then you make the deal. That’s the art of the deal.”

The v1 wargame never found the Israel-US tension because the agents never interacted. In v2, it was the primary driver of the entire outcome.

Round 3: fog of war

I ran a final round stress-testing whether the pause actually survives contact with reality. The answer: barely.

April 1-2. Israel conducts its 72-hour sprint. 80+ strikes per day on Fordow, Isfahan, Parchin. Kills two nuclear scientists. Iran opens the corridor anyway but leaks the Witkoff texts to Al Jazeera, framing it as “Iran chose peace while Israel bombed.” Trump is furious at the optics.

April 3. The pause begins. Four uncharted mines are found in the corridor (Iran’s data was 80-85% complete, not 100%). An IRGC fast boat nearly causes an incident with an Indian tanker in the strait. The Houthis launch seven attacks on Israel and Red Sea shipping. Nobody told the Houthis about the pause because nobody can tell the Houthis anything.

April 4. A Saudi tanker transits the corridor in 8 hours (normally 3). One mine found mid-transit, 90-minute halt while divers neutralise it. Trump declares victory on Truth Social. But insurance companies won’t certify the route, and satellite imagery shows new Russian anti-ship missiles being deployed at Iranian coastal batteries. Iran used the pause to receive resupply and reposition.

April 5-6. Muscat talks. Witkoff and Araghchi find agreement on monitoring (14-day managed IAEA access), enrichment capped at 3.67%, stockpile blended down on Iranian soil, phased sanctions relief. They deadlock on dismantlement (bridged with ambiguity) and the Houthi problem (which nobody can solve).

April 7. Trump announces the “Muscat Framework” from the Rose Garden. Hormuz open, nuclear monitoring, sanctions sequencing. The Abraham Accords expansion with Saudi “very close.”

What v2 found that v1 couldn’t see

The v1 wargame said de-escalation was 45% likely. The v2 wargame says the most likely outcome is messier than that.

Scenariov1 (no interplay)v2 (with interplay)
A: Clean de-escalation45%30%
B: Partial deal, protracted standoff40%50%
C: Escalation cascade15%20%

The Muscat Framework is real but fragile. The simulation gives it a 55% chance of surviving 90 days, 30% at 180 days, and 10% of lasting five years. The historical parallel is the 1994 US-North Korea Agreed Framework. Genuine agreement. Managed the crisis. Bought time. Dead within eight years.

Why v2 is more pessimistic than v1:

v1 found that five of six actors want the same thing. That’s true. But v2 found that wanting the same thing and getting there are completely different problems. Iran used the pause to rearm. Israel front-loaded maximum violence before the pause began. The Houthis attacked through every day of the “ceasefire.” Russia’s resupply gave Iran a stronger military position coming out of the pause than going in. And the nuclear question was bridged with ambiguity, not resolved.

The most consequential outcome of the entire crisis may not be the Iran deal at all. It may be the Saudi-Israel normalisation it catalysed. If the Abraham Accords expansion holds, it reshapes Middle East security permanently, regardless of what happens with Iran’s nuclear program.

The leading indicator to watch: Not nukes. Not sanctions. The Houthis. If Red Sea attacks continue through May, Iran can’t deliver the regional de-escalation the framework assumes. That’s the thread that unravels everything.

What I learned about the modelling itself

The agents-in-isolation approach (v1) was useful but overconfident. It found the broad shape of the outcome (convergence toward Hormuz reopening) but missed the dynamics that make the outcome fragile.

Adding communication topology and move order (v2) revealed three things:

  1. The most important collision wasn’t between enemies. It was between allies. The Israel-US tension over war aims and timelines drove more of the outcome than the US-Iran standoff.

  2. Autonomous actors break frameworks. The Houthis aren’t under anyone’s control. Any deal that assumes Iran can deliver proxy compliance is built on sand.

  3. Pauses get exploited. Every actor used the 96-hour window for repositioning, resupply, and narrative warfare. The pause didn’t freeze the conflict. It changed the shape of the next phase.

If I were building this again, I’d add a seventh agent for the Houthis and model them as genuinely autonomous. They were the chaos variable that no framework could contain.

Disclaimer

This is not intelligence analysis. The agents are reasoning from publicly available sources. This is not a prediction. Wargames stress-test assumptions, they don’t tell the future. The value isn’t “this will happen.” The value is “I hadn’t considered that Israel and America have fundamentally different war aims, and that tension drives the entire outcome.”

The RAND Corporation has been running wargames since the 1950s. The Pentagon’s war colleges use multi-actor simulations as a core analytical tool. I ran mine from my kitchen table in Sydney because I was annoyed about the price of diesel. The v1 simulation took seven minutes. The v2 simulation, with rounds of interplay, took about an hour.

So how long is diesel going to stay expensive?

The v1 wargame said 4-6 weeks. The v2 wargame says longer.

Hormuz partially reopened in the simulation (western corridor, ~35-40% of pre-crisis capacity). Oil dropped from the $126 peak to $92-95 on the corridor news. But a permanent risk premium is baked in. Insurance companies won’t certify the route. Houthis threaten the next chokepoint. Only one lane of the strait is open.

My revised estimate: diesel comes down gradually from $3.20 to around $2.60-2.80 over the next 6-8 weeks. But we’re not going back to pre-crisis prices. The risk premium and the Houthi wildcard are here to stay.

I’m still filling up at Costco. $3.15 is still painful. But at least now I understand why, and roughly how long.