Mender Hand-off

Guard failure to Trace forensic replay to Promote to fix to Mender draft PR — the closed-loop regression workflow.

When Guard confirms a regression, the failing trace opens in Trace for forensic replay. Debugger cites events on the five-ribbon timeline; confirmed fixes hand off to Mender, which drafts a fix pull request with regression tests. A human always approves the merge.

This page documents the failure → trace → promote → Mender PR workflow — Trace's contribution to Molar's closed-loop QA story.


The closed loop

  Guard check fails
  Trace captured (auto-pin)
  Engineer opens trace detail
        ├──▶ Debugger: "why did step N fail?"
        ├──▶ Layer 2 replay: validate hypothesis
  Promote to fix (confirmed diff)
  Mender drafts fix PR + regression test
  Human reviews and merges
  Guard re-validates on PR → trace proves green

Tag line across the portfolio: failure → trace → mender.


Step 1 — Guard failure produces a trace

Every Guard check — PR gate, production monitor, shadow-prod diff, manual run — will capture a trace via @molar/trace-capture when Guard integration ships.

Guard eventTrace behavior
Check passesTrace status=passed, normal retention
Mender attempt existsmender_summary written to summary.json

Deep links:

FromTo
GitHub check annotationtrace.molar.it/traces/{id}?step={N}
Guard check detailOpen trace button
Slack alertPermalink /r/{shortId}

Configure: Guard project settings → Trace capture (default on). See Integrations.


Step 2 — Forensic replay in Trace

Open the trace from the Trace list or the GitHub annotation.

What to review first

SurfaceLook for
Overview tabFailure signature, cluster count, Mender pre-classification
Five ribbonsAlign network 404 with console error at same timestamp
Clones tabStripe/email state at failure step — webhook URL, amounts
Debugger tabAuto-generated forensic summary

Example failure (demo trace):

  • Signature: webhook-404
  • Step 7: webhook POST returned 404
  • Clone state: amt=9000 instead of amt=900 — coupon multiplier inverted
  • Mender pre-classification: product_bug at 87% confidence

Full viewer guide: Trace detail.


Step 3 — Debugger confirms root cause

Ask Debugger targeted questions before promoting:

"Why did step 7 fail?" "What changed in the Stripe clone state between step 6 and 7?" "Compare network calls to the last green run on main."

Debugger cites (seq:N) references — click to jump scrubber. It reads Mender's existing classification via summarize_mender_analysis() without re-running Mender.

When hypothesis needs validation:

"Replay from step 7 with webhook URL /api/webhooks/stripe"

Debugger calls run_layer2_replay with user confirmation. See Debugger & replay.


Step 4 — Layer 2 validates the fix

Layer 2 replay produces a child trace with the edited source. The diff view shows step status deltas today.

Target dimensions:

DimensionWhat changed
Network404 → 200 on webhook POST
Clone stateCorrect subscription amount
StepsStep 7+ pass in replay
ConsoleError cleared

Review the Patch tab for the unified diff Mender will receive.

Only promote when:

  • Replay status=succeeded
  • Side-by-side diff shows the expected fix
  • No unexpected regressions in network or clone diffs

Step 5 — Promote to fix

The Promote to fix button appears in:

LocationWhen visible
Overview tab → Mender cardMender classification exists
Layer 2 diff view footerReplay succeeded with edits
Debugger chatAfter confirmed Layer 2 success (tool suggestion)

Click Promote to fix PR → confirmation modal:

FieldContent
Source traceUUID + short_id
Replay traceChild trace (if Layer 2 ran)
Edits payload{ scenario_diff, source_diffs[{path, diff}] }
ClassificationFrom Mender or Debugger
Target branchPR branch from Guard source metadata

Submit → POST /v1/traces/{id}/promote-fix creates a mender_attempts row and enqueues Mender when guard_api_url is configured. Without Guard API wiring, the endpoint returns a draft payload for review (.

RBAC

RolePromote
Owner
Member
Viewer

Step 6 — Mender drafts the fix PR

Mender (Guard's auto-fix subsystem) receives:

  • Trace NDJSON slices for failure context
  • Layer 2 edit payload (scenario + source diffs)
  • Debugger analysis markdown
  • Classification + confidence score

Mender pipeline:

  1. Analyze — confirm classification against trace evidence
  2. Patch — apply source diffs to target branch
  3. Test — add/update regression test from scenario file
  4. PR — open draft pull request via GitHub App
  5. Re-validate — Guard runs on Mender PR → new trace proves green
Mender fieldTrace source
classificationOverview Mender card
confidencemender_attempts.confidence
analysis_markdownDebugger + Mender summary
patch JSONBLayer 2 edits payload
pr_urlOutbound link in Overview
outcomesuggestedappliedrejected

Track status from Trace Overview → Open Mender attempt or Guard → Mender tab.

Full Mender docs: Guard Mender auto-fix.


Step 7 — Human review and merge

Mender never auto-merges. The PR draft includes:

  • Fix commit with source changes
  • Regression test derived from the same .molar/scenarios/{id}.molar.md file
  • PR description with trace permalink and failure summary
  • Guard check status on the Mender PR

Review checklist:

  • Fix matches Layer 2 diff you validated in Trace
  • Regression test covers the failure signature
  • No unrelated file changes
  • Guard check green on Mender PR (new trace linked)

After merge, the original failure signature cluster should stop growing. Verify in Trace listFailed · 24h view.


Mender card in Trace Overview

When Guard has already invoked Mender (pre-promotion classification):

┌─────────────────────────────────────────────────┐
│ MENDER ANALYSIS                    87% · product_bug │
│ Webhook URL mismatch — route renamed in PR #4521   │
│ without updating Stripe clone config.                │
│ [Open Mender attempt]  [Promote to fix PR]         │
└─────────────────────────────────────────────────┘
Outcome badgeMeaning
suggestedMender analyzed; no PR yet
appliedPR opened
rejectedHuman dismissed

Promote overrides suggestedapplied when you confirm the fix.


End-to-end example

Scenario: checkout-stripe fails on PR #4521 with webhook-404 signature.

  1. Guard check fails → trace xY9zQ2mNp auto-pinned
  2. Engineer opens trace → step 7 selected, NETWORK ribbon shows red tick
  3. Debugger: "webhook 404 — URL /api/stripe/webhook vs handler at /api/webhooks/stripe"
  4. Layer 2 replay from step 7 with config fix → child trace passes
  5. Diff view: webhook 404 → 200, amount corrected in diff view
  6. Promote to fix → Mender opens PR #4522
  7. Engineer reviews PR, merges
  8. Guard re-runs on main → green trace, cluster count drops

What Trace does NOT do

ActionOwner
Open pull requestsMender
Modify source files directlyMender (via PR)
Auto-mergeNever — human required
Re-run Guard checksGuard
Author scenariosCartographer

Debugger may suggest promoting to Mender but cannot invoke it without explicit user action on the Promote button.


Configuration

SettingLocationPurpose
Mender enabledGuard project settingsAllow auto-fix pipeline
GitHub App installedOrg integrationsPR creation + source reads
Trace capture onGuard project settingsEnsure failure artifacts exist
Promote RBACTrace team settingsRestrict to owners if needed

PageWhen
Guard Mender auto-fixMender configuration and PR format
Debugger & replayLayer 2 before promoting
Trace detailOverview Mender card
IntegrationsGuard → Trace wiring