Reading Results

The scoreboard, finding status pills, and how to translate a Battle Mode outcome into a merge decision.

Overview

A Battle Mode run produces a lot of output: the scoreboard, per-finding statuses, exploit code, fix diffs, retest results, and a verified PR. This page is about how to read all of that — and more importantly, how to translate it into a decision about whether to merge, deploy, or go back to the drawing board.

The scoreboard

The top of the battle results page shows three counts. These are the headline numbers.

Exploited

Findings the Red Team successfully attacked in a deployed environment. These are proven real.

Fixed

Patches applied AND survived the retest phase. These are verified closures.

Bypassed

Patches that initially looked like fixes but failed when Red Team came back with a new exploit.

How the numbers relate

RelationWhat it tells you
Every Fixed implies a corresponding ExploitedYou can't fix what wasn't exploited
Exploited − Fixed − BypassedFindings with no surviving fix at all ("Unfixed" tier)
Exploited = Fixed, Bypassed = 0Ideal outcome — every real bug got closed and every closure held
High Bypassed countFirst-pass fixes were superficial; underlying class of vuln not addressed

Per-finding status pills

Each finding in the battle has three pills indicating where it ended up across the six phases.

PillColorPhaseMeaning
ExploitedRed2 or 3Red Team landed an exploit. Finding is proven real.
Fix appliedBlue4Blue Team produced a candidate patch that closed the original exploit
Defended or BypassedGreen / Red5Whether the fix survived a fresh Red Team attack

A fully-closed finding shows: Red (exploited) → Blue (fix applied) → Defended (survived retest). Anything short of that is a finding that needs human attention before you merge or deploy.

The finding card

Click any finding to expand its card. Inside you'll see:

FieldWhat it shows
Exploit codeThe actual test Red Team wrote — usually a Foundry test showing how the vulnerability was triggered
Fix diffThe code change Blue Team proposed, as a diff against the original
Retest outcomeDefended or Bypassed. If Bypassed, the new exploit that got around the fix
Finding contextThe original CARA finding that seeded this battle

For a Defended finding, the finding card is effectively a complete remediation record: the bug, the exploit, the fix, the proof the fix holds. That's the audit-trail artifact you'd want if someone asked "how do you know this vulnerability is actually closed?"

The verified PR

Battle Mode produces a pull request on the target repo containing every Defended fix. Bypassed and Unfixed findings are not in the PR — deliberately, because the point of the PR is "these are the changes we've proven survive attack."

Read the PR the way you'd read any PR, with two adjustments:

  1. Cross-reference the PR description against the scoreboard. The PR should list exactly the Defended findings. If there's a mismatch, something's off — probably a Fix that should have been excluded.
  2. Don't skip review on trust. Battle Mode verifies that a fix closes the exploit, but you still want a human engineer to confirm the fix is idiomatic, doesn't change public interfaces, doesn't introduce regressions in tests. The forge test the sandbox ran is necessary, not sufficient.

Once reviewed, the PR merges like any other. Merging moves the corresponding findings to Resolved in the normal finding lifecycle.

Handling Bypassed and Unfixed

These are the findings the automated loop couldn't close. They deserve the most human attention.

StatusWhat it meansWhat to do
BypassedA fix that looked like it worked didn't hold upRead the new exploit. Often the right move is a deeper refactor rather than another patch attempt
UnfixedBlue Team tried 3 times and couldn't close the exploitUsually architectural. Route to human engineering
⚠️

Do not deploy with open Exploited-but-Unfixed or Bypassed findings without an explicit accept-the-risk decision. Battle Mode produced a working exploit against live contracts; that exploit will work against mainnet unless something changed.

Merge and deploy decisions

A pragmatic rule of thumb:

Result shapeMerge?Deploy?
All Exploited findings are Fixed (Defended)Review and merge the PRReady
Some findings BypassedReview verified PR separatelyNot ready
Findings UnfixedMerge verified portion if helpfulNot ready
Discovery phase found new CriticalsRegardless of other outcomesNot ready

The verified PR can often be merged even when the overall battle result is mixed, because merging a genuinely-Defended fix is never wrong. But deployment is a stricter bar: you want zero open Exploited findings, not "most of them are closed."

CSV export

For teams with audit-trail or compliance requirements, Battle Mode results can be exported as CSV from the battle page. The export includes finding ID, severity, Exploited / Fixed / Bypassed status, retest outcome, and links to the exploit and fix artifacts.

Useful for:

  • External auditor hand-off ("here's what we ran, here's what we found, here's what we fixed").
  • Internal governance records — showing that due diligence occurred before a deploy.
  • Tracking remediation velocity over time by rolling up multiple battles into a single historical view.

Two sample read-outs

Ship it

A realistic result for a well-audited protocol, just before mainnet deploy:

MetricValue
Exploited4 (two Highs, two Mediums from the original audit; zero from Discovery)
Fixed4
Bypassed0
Verified PR4 commits, one per finding

Human reviews the PR, merges, and the deploy can proceed.

Not yet

A problematic version of the same battle:

MetricValue
Exploited6 (four from above + two new Criticals from Discovery)
Fixed3
Bypassed2
Unfixed1
Verified PR3 commits

Read: the audit under-counted the attack surface; two fixes looked good but failed under fresh attack; one issue is architectural. Merge the 3-commit PR if clean, but don't deploy until the Bypassed and Unfixed items are resolved by humans.