Production monitoring

Synthetic users, scheduled checks, multi-region monitoring, incidents, alerting, and analytics onboarding.

Production monitoring

Production monitoring runs your scenarios on a schedule against your real production URL. Guard uses synthetic users — marked at every layer — to execute full user journeys (signup, checkout, webhooks) without polluting analytics or charging real money.

When a flow breaks in prod, Guard opens incidents, routes alerts to your on-call stack, and hands failures to Mender for triage and fix PRs.

Dashboard: app.molar.it/dashboard/guardMonitoring · Product page: guard.molar.it


How production monitoring differs from PR gating

DimensionPR gatingProduction monitoring
Triggerpull_request webhookCron (scheduled_checks)
TargetPreview URL or ClonesReal production URL
UsersClone identitiesSynthetic prod users
GoalBlock bad mergesDetect regressions live users would hit
Run typeprschedule

The same scenario file powers both — no second deployment of tests.


Synthetic user infrastructure

Synthetic monitoring mistakes can contaminate billing and analytics. Guard uses defense in depth so synthetic activity is always identifiable and never triggers real charges.

Database marking

Add an is_synthetic column to your users table (or equivalent):

ALTER TABLE users ADD COLUMN is_synthetic BOOLEAN NOT NULL DEFAULT FALSE;
CREATE INDEX users_is_synthetic_idx ON users WHERE is_synthetic = TRUE;

-- Analytics views must exclude synthetics
CREATE VIEW analytics_users AS SELECT * FROM users WHERE is_synthetic = FALSE;

Guard provisions synthetic users per region and scenario shard. Email convention:

guard-monitor+{region}+{scenarioSlug}+{shard}@{yourDomain}.com

RFC 5233 sub-addressing routes tagged mail to your inbox; in sidecar mode, the Email Clone captures messages without real delivery.

Request-time signals (three orthogonal markers)

SignalWhereUsed by
X-Synthetic-Source: molar-guardAll outbound HTTP from Guard workerYour middleware and observability filters
window.__MOLAR_SYNTHETIC__ = truePlaywright addInitScriptClient-side analytics filters
is_synthetic = trueCustomer databaseApplication code paths

Money-flow blocking

Guard ships middleware libraries (@molar/synthetic-stripe-middleware, etc.) for Express and Next.js:

ProviderProtection
StripeSwap to sk_test_* SDK when user.is_synthetic — live charges impossible
TwilioRoute to test credentials (+15005550006) or Twilio Clone
EmailRoute to Email Clone SMTP sink — no real delivery
ClerkPre-provisioned users with metadata.is_synthetic = true
S3Uploads to s3://bucket/_molar_synthetic/ with 7-day lifecycle

Django and Rails Stripe middleware snippets ship in @molar/synthetic-middleware; first-class Express and Next.js helpers are the most complete today.

Nightly cleanup

Guard provides cleanup SQL templates per stack (Postgres, MySQL, MongoDB, DynamoDB), generated by Cartographer from your schema:

DELETE FROM orders
  WHERE user_id IN (SELECT id FROM users WHERE is_synthetic = TRUE)
    AND created_at < NOW() - INTERVAL '7 days';

Retention is customer-configurable.


Before enabling production schedules, complete the synthetic safety checklist in dashboard Settings. Guard blocks new schedules until a synthetic preflight audit event is recorded for the scenario.

Per-platform exclusion

Filter synthetic users in whatever analytics stack you use. Common patterns:

SignalTypical filter
User traitis_synthetic: true on identify / user properties
Event property$ignore: true or skip track when synthetic
Internal usersMark synthetic emails or user IDs as internal/test
BillingTest-mode customers only for synthetic users
APM / errorsTag or exclude sessions with synthetic: true

The onboarding wizard links to copy-paste snippets for your stack. Middleware install is verified before schedules go live.


Scheduled checks

Production monitors are stored as scheduled_checks rows — one per scenario (or scenario + policy combination).

Default cadence

Every 5 minutes per scenario, configurable from 1 minute to 1 hour per check.

Create a schedule

CLI:

pnpm molar-guard schedule create stripe-subscription-upgrade \
  --cron "*/5 * * * *"

Scenario frontmatter:

---
id: stripe-subscription-upgrade
schedule:
  cron: "*/5 * * * *"
  regions: [us-east-1, eu-west-1, ap-south-1]
  shadow_prod: true
---

Dashboard: app.molar.it/dashboard/guardMonitoringSchedules (create/edit cron, regions, alert policy, pause).

Scheduler architecture

  • BullMQ repeatable jobs — same pattern as the Molar Molar API
  • Each tick fans out by regions[] on the check
  • Fresh Playwright browserContext per run — no state bleed
  • Per-customer concurrency cap (default: 8 concurrent prod runs) prevents self-DoS

Run isolation

Every scheduled run gets:

  • Fresh browser context
  • Fresh synthetic-user session (per shard)
  • Fresh logical clock
  • Optional fresh Clone bundle (shadow-prod path)

Multi-region monitoring

Guard workers run in:

RegionAvailability
us-east-1All tiers
eu-west-1All tiers
ap-south-1All tiers
us-west-2, ap-southeast-1, sa-east-1Business tier

Set regions on each scheduled check. The scheduler fans out one run per region per tick.

Regional comparison in dashboard

The Monitoring grid shows scenarios × regions. Cells display current status + sparkline. Highlight patterns:

  • Single region red — likely regional outage or CDN edge issue
  • 3+ regions fail within 60s — collapsed to one global_outage incident

Incidents

When alert thresholds fire, Guard opens a guard_incident — deduplicated per scenario (and per region unless global collapse applies).

Incident lifecycle

threshold breached → incident opened → alerts sent
        ├── ack (human acknowledges)
        ├── suppress (duration + reason)
        └── auto-resolve (2 consecutive successes)
StatusMeaning
openActive failure
acknowledgedOwner assigned, investigating
suppressedSnoozed (maintenance, known issue)
resolvedScenario recovered

Smart dedup

  • One open incident per scenario × region until resolved
  • Same scenario failing in 4 regions = 4 alerts (real regional issue)
  • Global collapse: 3+ regions fail within 60s on same scenario → single global_outage incident

Dashboard

app.molar.it/dashboard/guardIncidents

  • Filter: open / acked / suppressed / resolved
  • Types: consecutive_failures, failure_rate, latency_p99, shadow_diff, global_outage
  • Actions: ack, suppress, link to root run artifacts, trigger Mender

Alerting

Configure alert policies per scheduled check (alert_policy JSONB):

RuleDefaultDescription
Consecutive failures2N failures in a row
Failure rate50% over 30 minM% failure rate in window
Shadow-prod diffon when shadow_prod: trueThird-party model drift
Self-healinginformationalLocator heal occurred

Integrations

ChannelStatus
Generic webhookShipped
SlackShipped
PagerDutyShipped
Microsoft TeamsShipped
OpsgenieShipped
Email (via notification webhook)Shipped
MCP (molar://incidents)Shipped (standalone Guard MCP)

Auto-resolve: when a scenario recovers (2 consecutive successes), incident closes and a "good news" notification is sent.


Production dashboard

Monitoring grid

Route: /monitoring

  • Matrix: rows = scenarios, columns = regions
  • Cell = current status + pass/fail sparkline
  • Drill-down: last N runs, error message, shadow-diff flag
  • Actions: pause region, snooze scenario, run check now, open incident

Check run detail

Click any scheduled run for:

  • Step timeline with assertion messages
  • Screenshot, video (5s pre-failure), HAR, console logs
  • Clone state diff on failure
  • Mender triage panel with Apply fix (suggestive mode)
  • Open trace → Cartographer trace when trace_id present

Status page

Guard exposes public health JSON at GET /v1/status. Hosted guard.molar.it/status pages.


Enable production monitoring (checklist)

  1. Complete onboarding — GitHub connected, scenarios imported
  2. Install synthetic middleware — Express/Next.js (required for money flows)
  3. Run analytics preflight — confirm synthetic order excluded from exports
  4. Create scheduled checks — pick scenarios, cron, regions
  5. Configure alerts — Slack or PagerDuty webhook
  6. Optional: enable shadow_prod: true on checks touching third-party APIs — see Shadow-prod diff
  7. VerifyGET /v1/status healthy; first green runs in Monitoring grid

Example: full scenario with production schedule

id: stripe-subscription-upgrade
description: User on Free plan upgrades to Pro
tags: [billing, critical]
schedule:
  cron: "0/5 * * * *"
  regions: [us-east-1, eu-west-1, ap-south-1]
  shadow_prod: true
mender:
  mode: suggestive
cache: never

Stripe subscription upgrade

Steps

  1. Navigate to /settings/billing
  2. Click "Upgrade to Pro"
  3. Assert badge shows "Pro"
  4. Assert webhook customer.subscription.created received

---

## API reference (schedules and incidents)

| Method | Path | Purpose |
|--------|------|---------|
| `POST` | `/v1/scheduled_checks` | Create schedule |
| `PUT` | `/v1/scheduled_checks/:id` | Update cron, regions, alert policy |
| `POST` | `/v1/scheduled_checks/:id/pause` | Pause for N hours |
| `POST` | `/v1/runs/manual` | Trigger run now |
| `POST` | `/v1/incidents/:id/ack` | Acknowledge |
| `POST` | `/v1/incidents/:id/suppress` | Suppress with reason |

See [Webhooks & API](/docs/guard/webhooks-api) for auth and payloads.

---

## Usage limits

Production region count, concurrent prod runs, and minimum schedule interval are enforced per organization. See **Settings → Billing** on [app.molar.it](https://app.molar.it/billing) for your org's limits — tier tables are not published in this documentation.

---

## Next

- [Shadow-prod diff](/docs/guard/shadow-prod-diff) — parallel prod + Clone comparison
- [Mender auto-fix](/docs/guard/mender) — triage and fix PRs from prod failures
- [PR gating](/docs/guard/pr-gating) — pre-merge checks with the same scenarios
- [Configuration](/docs/guard/configuration) — `baseUrl.schedule`, frontmatter, alert JSON
- [Security](/docs/guard/security) — synthetic safety deep dive
- [Troubleshooting](/docs/guard/troubleshooting) — false alerts, middleware gaps