Shadow-prod diff
Shadow-prod diff is Guard's unique capability: run the same scenario against real production and a Clones replica in parallel, then semantically diff the observations.
The alert is not "your app broke" — it is "your model of the third-party world is no longer correct." Stripe shipped a new webhook field. Twilio changed callback format. Clerk added a JWT claim. Your application code may be fine; your Clones (and tests) are stale.
Pitch line: We don't just test your code. We test that your model of the third-party world is still correct.
Dashboard: app.molar.it/dashboard/guard → Shadow-diff review · Product page: guard.molar.it
Why shadow-prod exists
Traditional monitoring answers: did the user journey complete?
Shadow-prod answers: did production behave the same way our isolated test environment expects?
| Failure mode | PR gating catches? | Prod synthetics catch? | Shadow-prod catches? |
|---|---|---|---|
| App regression (broken button) | ✓ | ✓ | Sometimes |
| Stripe webhook payload shape change | ✗ (Clone still old shape) | Maybe (if assertion checks field) | ✓ |
Twilio MessageStatus format drift | ✗ | Maybe | ✓ |
| Clerk new required JWT claim | ✗ | Maybe | ✓ |
| S3 new error response shape | ✗ | Maybe | ✓ |
This is only possible because Molar owns Clones — stateful fakes for Stripe, Twilio, Email, Clerk, S3 with deterministic IDs and virtual clocks.
How it works
On each scheduled check tick where shadow_prod = true:
┌──► Real prod run (synthetic user, money-flow blocked)
│
Cron tick ──────────┤
│
└──► Clones replica (deterministic seed per 5-min window)
│
▼
Compare RunObservation JSONB
│
▼
Divergence? → shadow_diff incident + review UI
Parallel runs
-
Real prod — synthetic user, full journey against production URL. Money-flow middleware ensures no real charges. Same signals as Production monitoring (
X-Synthetic-Source,__MOLAR_SYNTHETIC__, etc.). -
Clones replica — same scenario steps against Clones bundle. Seed =
sha256(scenarioId || timestamp/300s)— stable within a 5-minute window so paired runs compare apples to apples.
Both runs emit a RunObservation document:
{
"response_bodies": { "step_3": "semanticHashOfBody", "step_7": "…" },
"webhook_events_fired": [
{ "provider": "stripe", "event_type": "customer.subscription.updated", "payload_semantic_hash": "abc123" }
],
"dom_state_per_step": [{ "step": 7, "role_tree_hash": "…" }],
"headers_per_step": ["…"]
}
Semantic diff engine
The diff compares prod vs Clone observations:
| Layer | What differs |
|---|---|
| Response bodies | Semantic JSON diff per step — ignores volatile fields (id, created, *_at, request_id, ETags, Date headers) |
| Webhook events | Did Stripe in prod fire a field the Clone doesn't model? |
| DOM state | Accessibility role-tree hash per step |
| Headers | Normalized header comparison per step |
Customer-configurable ignore allowlist for known volatile fields.
Alert threshold
Default: significant divergence when any of:
- 1+ field differs in a webhook payload
- New field appears in prod webhook not present in Clone
- DOM role-tree distance > 5%
Threshold is configurable per scheduled check in alert_policy.
On breach: open a shadow_diff incident (run_type: shadow_diff in guard_runs).
Enable shadow_prod
Shadow-prod is configured per scheduled check, not globally.
Scenario frontmatter
---
id: stripe-subscription-upgrade
schedule:
cron: "*/5 * * * *"
regions: [us-east-1, eu-west-1, ap-south-1]
shadow_prod: true
---
CLI / API
When creating a schedule:
pnpm molar-guard schedule create stripe-subscription-upgrade \
--cron "*/5 * * * *" \
--shadow-prod
Or via REST:
POST /v1/scheduled_checks
Content-Type: application/json
{
"scenarioId": "…",
"cronExpr": "*/5 * * * *",
"regions": ["us-east-1", "eu-west-1"],
"shadowProd": true,
"alertPolicy": {
"shadowDiff": true,
"slackWebhookUrl": "https://hooks.slack.com/services/…"
}
}
Dashboard
app.molar.it/dashboard/guard → Monitoring → Schedules → edit check → enable Shadow-prod diff.
When to enable
| Enable shadow_prod | Reason |
|---|---|
| ✓ Scenarios with Stripe/Twilio/Clerk webhooks | Catch vendor API drift early |
| ✓ Billing and auth flows | High cost of silent model mismatch |
| ✗ Static marketing pages | No third-party contract to diff |
| ✗ Scenarios without Clones surfaces | Nothing to compare against |
Requires Clones enabled for the org (molar-guard.config.ts → clones.enabled: true or platform Clones connection).
Use cases (real examples)
Stripe webhook schema change
Production Stripe fires customer.subscription.updated with new field subscription.trial_settings.end_behavior. Your Clone still models the old payload. Shadow-prod alerts within one 5-minute tick — before a code deploy, before a prod synthetic assertion might fail.
Twilio callback format
MessageStatus callback adds a field or changes enum values. Prod observation differs from Clone; incident type shadow_diff.
Clerk JWT claims
Clerk adds a required claim to session tokens. Prod auth path returns different JWT shape than Clone.
S3 error responses
AWS changes error XML/JSON shape for a bucket policy edge case. Response body semantic hash diverges.
Review UI
When a shadow diff fires, engineers review side-by-side observations before suppressing or updating Clones.
Route: /shadow-diff/[runId] in app.molar.it/dashboard/guard
What you see
| Panel | Content |
|---|---|
| Prod observation | Response hashes, webhook events, DOM trees per step |
| Clone observation | Same structure from parallel run |
| Diff highlights | Added/removed/changed fields in webhooks; JSON path diffs in bodies |
| Webhook field diff | Stripe/Twilio/Clerk-specific field-level highlights |
| DOM role-tree summary | Distance metric and changed nodes |
Actions
| Action | Effect |
|---|---|
| Approve | Acknowledge expected vendor change; optionally snooze rule |
| Reject | Escalate — likely Clone or scenario needs update |
| Snooze | Suppress alerts for duration (maintenance, known rollout) |
| Create incident | Link to operational incident workflow |
| Update ignore list | Add field to semantic diff allowlist (persistent vendor addition) |
The review UX is a governed diff approval flow — you approve expected vendor drift before updating baselines or ignore lists.
Shadow-diff incidents
Shadow diffs create incidents distinct from failure-rate incidents:
| Field | Value |
|---|---|
| Type | shadow_diff |
guard_runs.run_type | shadow_diff |
shadow_diff JSONB | Full diff payload on the run row |
Alert routing
Set slackWebhookUrl, pagerDutyRoutingKey, or webhookUrl in the scheduled check's alertPolicy. Shadow-diff incidents use the same channels as consecutive-failure alerts — use a dedicated check or webhook endpoint if you want #vendor-drift separate from outage pages.
Mender interaction
Shadow-diff incidents are usually not product_bug — Mender may classify as scenario_bug (update Clone or scenario expectations) or skip fix-PR. Human review in shadow-diff UI is the primary workflow.
Observation storage and replay
Each shadow run stores:
artifacts_s3_prefix— screenshots, HAR, trace (same as other run types)shadow_diffJSONB — structured diff result- Link to paired prod and Clone run IDs
From Runs detail, filter run_type=shadow_diff to audit historical drift.
Configuration reference
Ignore volatile fields (global default)
Ignored in semantic JSON diff unless overridden:
id,created,updated,*_atrequest_id,trace_id- ETags,
Dateheaders
Per-check overrides
Suppress known-benign webhook diffs via suppressedWebhookDiffs in alertPolicy:
{
"suppressedWebhookDiffs": [
{
"provider": "stripe",
"eventType": "customer.subscription.updated",
"field": "subscription.metadata.internal_note",
"diffType": "added"
}
]
}
See Configuration for the full alertPolicy schema.
Operational playbook
1. Vendor announced API change
- Expect shadow-diff alert near rollout
- Open review UI → confirm field is expected
- Approve + add to ignore list OR update Clones bundle version
- Cartographer/regenerate scenarios if assertions need new fields
2. Unexpected diff
- Treat as potential prod/Clone skew or undiscovered vendor change
- Reject → open incident, notify platform team
- Run manual scenario:
pnpm molar-guard run <slug> --base-url https://prod… - Compare Clone session in Clones dashboard
3. False positive storm after deploy
- Snooze 24h while investigating
- Check if deploy changed response shapes intentionally
- Update scenario or Clone seed if test data drifted
Relationship to PR gating and prod monitoring
- PR gating never runs shadow-prod (Clones-only is faster and safer pre-merge)
- Prod monitoring can run with
shadow_prod: false— journey pass/fail only - Shadow-prod adds the parallel Clone arm on the same cron
One scenario file; enable shadow on schedules that touch external vendors.
Next
- Production monitoring — synthetics, incidents, alerting
- Clones — stateful vendor fakes that power the Clone arm
- PR gating — pre-merge Clones runs
- Mender auto-fix — when drift becomes a product bug
- Guard dashboard — Monitoring grid and shadow-diff routes
- Troubleshooting — false positives and ignore lists