Shadow-prod diff

Shadow-prod diff is Guard's unique capability: run the same scenario against real production and a Clones replica in parallel, then semantically diff the observations.

The alert is not "your app broke" — it is "your model of the third-party world is no longer correct." Stripe shipped a new webhook field. Twilio changed callback format. Clerk added a JWT claim. Your application code may be fine; your Clones (and tests) are stale.

Pitch line: We don't just test your code. We test that your model of the third-party world is still correct.

Dashboard: app.molar.it/dashboard/guard → Shadow-diff review · Product page: guard.molar.it

Why shadow-prod exists

Traditional monitoring answers: did the user journey complete?

Shadow-prod answers: did production behave the same way our isolated test environment expects?

Failure mode	PR gating catches?	Prod synthetics catch?	Shadow-prod catches?
App regression (broken button)	✓	✓	Sometimes
Stripe webhook payload shape change	✗ (Clone still old shape)	Maybe (if assertion checks field)	✓
Twilio `MessageStatus` format drift	✗	Maybe	✓
Clerk new required JWT claim	✗	Maybe	✓
S3 new error response shape	✗	Maybe	✓

This is only possible because Molar owns Clones — stateful fakes for Stripe, Twilio, Email, Clerk, S3 with deterministic IDs and virtual clocks.

How it works

On each scheduled check tick where shadow_prod = true:

                    ┌──► Real prod run (synthetic user, money-flow blocked)
                    │
Cron tick ──────────┤
                    │
                    └──► Clones replica (deterministic seed per 5-min window)
                              │
                              ▼
                    Compare RunObservation JSONB
                              │
                              ▼
              Divergence? → shadow_diff incident + review UI

Parallel runs

Real prod — synthetic user, full journey against production URL. Money-flow middleware ensures no real charges. Same signals as Production monitoring (X-Synthetic-Source, __MOLAR_SYNTHETIC__, etc.).
Clones replica — same scenario steps against Clones bundle. Seed = sha256(scenarioId || timestamp/300s) — stable within a 5-minute window so paired runs compare apples to apples.

Both runs emit a RunObservation document:

{
  "response_bodies": { "step_3": "semanticHashOfBody", "step_7": "…" },
  "webhook_events_fired": [
    { "provider": "stripe", "event_type": "customer.subscription.updated", "payload_semantic_hash": "abc123" }
  ],
  "dom_state_per_step": [{ "step": 7, "role_tree_hash": "…" }],
  "headers_per_step": ["…"]
}

Semantic diff engine

The diff compares prod vs Clone observations:

Layer	What differs
Response bodies	Semantic JSON diff per step — ignores volatile fields (`id`, `created`, `*_at`, `request_id`, ETags, `Date` headers)
Webhook events	Did Stripe in prod fire a field the Clone doesn't model?
DOM state	Accessibility role-tree hash per step
Headers	Normalized header comparison per step

Customer-configurable ignore allowlist for known volatile fields.

Alert threshold

Default: significant divergence when any of:

1+ field differs in a webhook payload
New field appears in prod webhook not present in Clone
DOM role-tree distance > 5%

Threshold is configurable per scheduled check in alert_policy.

On breach: open a shadow_diff incident (run_type: shadow_diff in guard_runs).

Enable shadow_prod

Shadow-prod is configured per scheduled check, not globally.

Scenario frontmatter

---
id: stripe-subscription-upgrade
schedule:
  cron: "*/5 * * * *"
  regions: [us-east-1, eu-west-1, ap-south-1]
  shadow_prod: true
---

CLI / API

When creating a schedule:

pnpm molar-guard schedule create stripe-subscription-upgrade \
  --cron "*/5 * * * *" \
  --shadow-prod

Or via REST:

POST /v1/scheduled_checks
Content-Type: application/json

{
  "scenarioId": "…",
  "cronExpr": "*/5 * * * *",
  "regions": ["us-east-1", "eu-west-1"],
  "shadowProd": true,
  "alertPolicy": {
    "shadowDiff": true,
    "slackWebhookUrl": "https://hooks.slack.com/services/…"
  }
}

Dashboard

app.molar.it/dashboard/guard → Monitoring → Schedules → edit check → enable Shadow-prod diff.

When to enable

Enable shadow_prod	Reason
✓ Scenarios with Stripe/Twilio/Clerk webhooks	Catch vendor API drift early
✓ Billing and auth flows	High cost of silent model mismatch
✗ Static marketing pages	No third-party contract to diff
✗ Scenarios without Clones surfaces	Nothing to compare against

Requires Clones enabled for the org (molar-guard.config.ts → clones.enabled: true or platform Clones connection).

Use cases (real examples)

Stripe webhook schema change

Production Stripe fires customer.subscription.updated with new field subscription.trial_settings.end_behavior. Your Clone still models the old payload. Shadow-prod alerts within one 5-minute tick — before a code deploy, before a prod synthetic assertion might fail.

Twilio callback format

MessageStatus callback adds a field or changes enum values. Prod observation differs from Clone; incident type shadow_diff.

Clerk JWT claims

Clerk adds a required claim to session tokens. Prod auth path returns different JWT shape than Clone.

S3 error responses

AWS changes error XML/JSON shape for a bucket policy edge case. Response body semantic hash diverges.

Review UI

When a shadow diff fires, engineers review side-by-side observations before suppressing or updating Clones.

Route: /shadow-diff/[runId] in app.molar.it/dashboard/guard

What you see

Panel	Content
Prod observation	Response hashes, webhook events, DOM trees per step
Clone observation	Same structure from parallel run
Diff highlights	Added/removed/changed fields in webhooks; JSON path diffs in bodies
Webhook field diff	Stripe/Twilio/Clerk-specific field-level highlights
DOM role-tree summary	Distance metric and changed nodes

Actions

Action	Effect
Approve	Acknowledge expected vendor change; optionally snooze rule
Reject	Escalate — likely Clone or scenario needs update
Snooze	Suppress alerts for duration (maintenance, known rollout)
Create incident	Link to operational incident workflow
Update ignore list	Add field to semantic diff allowlist (persistent vendor addition)

The review UX is a governed diff approval flow — you approve expected vendor drift before updating baselines or ignore lists.

Shadow-diff incidents

Shadow diffs create incidents distinct from failure-rate incidents:

Field	Value
Type	`shadow_diff`
`guard_runs.run_type`	`shadow_diff`
`shadow_diff` JSONB	Full diff payload on the run row

Alert routing

Set slackWebhookUrl, pagerDutyRoutingKey, or webhookUrl in the scheduled check's alertPolicy. Shadow-diff incidents use the same channels as consecutive-failure alerts — use a dedicated check or webhook endpoint if you want #vendor-drift separate from outage pages.

Mender interaction

Shadow-diff incidents are usually not product_bug — Mender may classify as scenario_bug (update Clone or scenario expectations) or skip fix-PR. Human review in shadow-diff UI is the primary workflow.

Observation storage and replay

Each shadow run stores:

artifacts_s3_prefix — screenshots, HAR, trace (same as other run types)
shadow_diff JSONB — structured diff result
Link to paired prod and Clone run IDs

From Runs detail, filter run_type=shadow_diff to audit historical drift.

Configuration reference

Ignore volatile fields (global default)

Ignored in semantic JSON diff unless overridden:

id, created, updated, *_at
request_id, trace_id
ETags, Date headers

Per-check overrides

Suppress known-benign webhook diffs via suppressedWebhookDiffs in alertPolicy:

{
  "suppressedWebhookDiffs": [
    {
      "provider": "stripe",
      "eventType": "customer.subscription.updated",
      "field": "subscription.metadata.internal_note",
      "diffType": "added"
    }
  ]
}

See Configuration for the full alertPolicy schema.

Operational playbook

1. Vendor announced API change

Expect shadow-diff alert near rollout
Open review UI → confirm field is expected
Approve + add to ignore list OR update Clones bundle version
Cartographer/regenerate scenarios if assertions need new fields

2. Unexpected diff

Treat as potential prod/Clone skew or undiscovered vendor change
Reject → open incident, notify platform team
Run manual scenario: pnpm molar-guard run <slug> --base-url https://prod…
Compare Clone session in Clones dashboard

3. False positive storm after deploy

Snooze 24h while investigating
Check if deploy changed response shapes intentionally
Update scenario or Clone seed if test data drifted

Relationship to PR gating and prod monitoring

PR gating never runs shadow-prod (Clones-only is faster and safer pre-merge)
Prod monitoring can run with shadow_prod: false — journey pass/fail only
Shadow-prod adds the parallel Clone arm on the same cron

One scenario file; enable shadow on schedules that touch external vendors.

Production monitoring — synthetics, incidents, alerting
Clones — stateful vendor fakes that power the Clone arm
PR gating — pre-merge Clones runs
Mender auto-fix — when drift becomes a product bug
Guard dashboard — Monitoring grid and shadow-diff routes
Troubleshooting — false positives and ignore lists

Production monitoring Mender auto-fix

Shadow-prod diff

Shadow-prod diff

Why shadow-prod exists

How it works

Parallel runs

Semantic diff engine

Alert threshold

Enable shadow_prod

Scenario frontmatter

CLI / API

Dashboard

When to enable

Use cases (real examples)

Stripe webhook schema change

Twilio callback format

Clerk JWT claims

S3 error responses

Review UI

What you see

Actions

Shadow-diff incidents

Alert routing

Mender interaction

Observation storage and replay

Configuration reference

Ignore volatile fields (global default)

Per-check overrides

Operational playbook

1. Vendor announced API change

2. Unexpected diff

3. False positive storm after deploy

Relationship to PR gating and prod monitoring

Next