Reading Results

The scoreboard, finding status pills, and how to translate a Battle Mode outcome into a merge decision.

Overview

A Battle Mode run produces a lot of output: the scoreboard, per-finding statuses, exploit code, fix diffs, retest results, and a verified PR. This page is about how to read all of that — and more importantly, how to translate it into a decision about whether to merge, deploy, or go back to the drawing board.

The scoreboard

The top of the battle results page shows three counts. These are the headline numbers.

Exploited

Findings the Red Team successfully attacked in a deployed environment. These are proven real.

Fixed

Patches applied AND survived the retest phase. These are verified closures.

Bypassed

Patches that initially looked like fixes but failed when Red Team came back with a new exploit.

How the numbers relate

Relation	What it tells you
Every Fixed implies a corresponding Exploited	You can't fix what wasn't exploited
Exploited − Fixed − Bypassed	Findings with no surviving fix at all ("Unfixed" tier)
Exploited = Fixed, Bypassed = 0	Ideal outcome — every real bug got closed and every closure held
High Bypassed count	First-pass fixes were superficial; underlying class of vuln not addressed

Per-finding status pills

Each finding in the battle has three pills indicating where it ended up across the six phases.

Pill	Color	Phase	Meaning
Exploited	Red	2 or 3	Red Team landed an exploit. Finding is proven real.
Fix applied	Blue	4	Blue Team produced a candidate patch that closed the original exploit
Defended or Bypassed	Green / Red	5	Whether the fix survived a fresh Red Team attack

A fully-closed finding shows: Red (exploited) → Blue (fix applied) → Defended (survived retest). Anything short of that is a finding that needs human attention before you merge or deploy.

The finding card

Click any finding to expand its card. Inside you'll see:

Field	What it shows
Exploit code	The actual test Red Team wrote — usually a Foundry test showing how the vulnerability was triggered
Fix diff	The code change Blue Team proposed, as a diff against the original
Retest outcome	Defended or Bypassed. If Bypassed, the new exploit that got around the fix
Finding context	The original CARA finding that seeded this battle

For a Defended finding, the finding card is effectively a complete remediation record: the bug, the exploit, the fix, the proof the fix holds. That's the audit-trail artifact you'd want if someone asked "how do you know this vulnerability is actually closed?"

The verified PR

Battle Mode produces a pull request on the target repo containing every Defended fix. Bypassed and Unfixed findings are not in the PR — deliberately, because the point of the PR is "these are the changes we've proven survive attack."

Read the PR the way you'd read any PR, with two adjustments:

Cross-reference the PR description against the scoreboard. The PR should list exactly the Defended findings. If there's a mismatch, something's off — probably a Fix that should have been excluded.
Don't skip review on trust. Battle Mode verifies that a fix closes the exploit, but you still want a human engineer to confirm the fix is idiomatic, doesn't change public interfaces, doesn't introduce regressions in tests. The forge test the sandbox ran is necessary, not sufficient.

Once reviewed, the PR merges like any other. Merging moves the corresponding findings to Resolved in the normal finding lifecycle.

Handling Bypassed and Unfixed

These are the findings the automated loop couldn't close. They deserve the most human attention.

Status	What it means	What to do
Bypassed	A fix that looked like it worked didn't hold up	Read the new exploit. Often the right move is a deeper refactor rather than another patch attempt
Unfixed	Blue Team tried 3 times and couldn't close the exploit	Usually architectural. Route to human engineering

⚠️

Do not deploy with open Exploited-but-Unfixed or Bypassed findings without an explicit accept-the-risk decision. Battle Mode produced a working exploit against live contracts; that exploit will work against mainnet unless something changed.

Merge and deploy decisions

A pragmatic rule of thumb:

Result shape	Merge?	Deploy?
All Exploited findings are Fixed (Defended)	Review and merge the PR	Ready
Some findings Bypassed	Review verified PR separately	Not ready
Findings Unfixed	Merge verified portion if helpful	Not ready
Discovery phase found new Criticals	Regardless of other outcomes	Not ready

The verified PR can often be merged even when the overall battle result is mixed, because merging a genuinely-Defended fix is never wrong. But deployment is a stricter bar: you want zero open Exploited findings, not "most of them are closed."

CSV export

For teams with audit-trail or compliance requirements, Battle Mode results can be exported as CSV from the battle page. The export includes finding ID, severity, Exploited / Fixed / Bypassed status, retest outcome, and links to the exploit and fix artifacts.

Useful for:

External auditor hand-off ("here's what we ran, here's what we found, here's what we fixed").
Internal governance records — showing that due diligence occurred before a deploy.
Tracking remediation velocity over time by rolling up multiple battles into a single historical view.

Two sample read-outs

Ship it

A realistic result for a well-audited protocol, just before mainnet deploy:

Metric	Value
Exploited	4 (two Highs, two Mediums from the original audit; zero from Discovery)
Fixed	4
Bypassed	0
Verified PR	4 commits, one per finding

Human reviews the PR, merges, and the deploy can proceed.

Not yet

A problematic version of the same battle:

Metric	Value
Exploited	6 (four from above + two new Criticals from Discovery)
Fixed	3
Bypassed	2
Unfixed	1
Verified PR	3 commits

Read: the audit under-counted the attack surface; two fixes looked good but failed under fresh attack; one issue is architectural. Merge the 3-commit PR if clean, but don't deploy until the Bypassed and Unfixed items are resolved by humans.