Troubleshooting & FAQ
Frequently asked questions
What does Trace add beyond a Playwright trace zip?
Trace captures continuous forensic ribbons — video, DOM, network, console, and agent thoughts on one playhead — with cloud aggregation, signed permalinks, Debugger AI, Layer 2 replay with Clone state restore, and MCP for coding agents. Every run still bundles trace.playwright.zip for compatibility with npx playwright show-trace.
What is Layer 2 replay?
Layer 2 lets you step backward through a failure, apply a test or scenario fix, and validate from that step — restoring Clone state and browser storage when needed — without re-running the entire suite. A child trace is captured and diffed side-by-side with the source.
How does Trace connect to Mender?
When Guard confirms a regression, the failing trace opens in Trace for forensic replay. Debugger cites events on the five-ribbon timeline; when you confirm the fix (often after Layer 2), Promote to fix PR hands the diff to Mender, which drafts a pull request with regression tests. A human always approves the merge.
Can Trace help with CI failure triage?
Yes. Trace ingests Playwright failures from CI (native capture or zip ingest), groups evidence on a navigable timeline, and exposes trace data to agents via MCP. Failure signature clusters are available via API filter (?signature=).
Does Trace record real user sessions?
No. Trace captures synthetic test runs (Guard, Cartographer agent, CI). For production-shaped monitoring, use Guard scheduled checks — still synthetic, not RUM.
What is a failure signature cluster?
A stable hash (failure_signature) derived from the failing assertion, dominant network error, and normalized stack. Traces with the same sig in 24h appear in the cluster strip — e.g. webhook-404 × 7.
Are share links secure?
Share links use unguessable 9-character shortId tokens (~51 bits). Default visibility is org-only; public-read is opt-in with expiry. Treat forwarded links like capability secrets — masking defaults reduce PII exposure.
What does pinned mean?
Pinned traces skip storage tier-down (hot → warm → cold). Pin traces manually with F on app.molar.it/trace. Unpinned traces follow org retention policy.
Troubleshooting
Server won't start: TRACE_ENCRYPTION_KEY
Symptom:
TRACE_ENCRYPTION_KEY is required and must be stable
Fix:
export TRACE_ENCRYPTION_KEY=$(openssl rand -hex 32)
npm run dev
Key must be 64 hex characters (32 bytes). Docker/Helm: mount secret into pod env.
Trace missing after Guard run
Note:
If a Guard check did not produce a trace row, upload via the ingest API or Cartographer capture.
| Check | Action |
|---|---|
Run still running? | Wait for run.end + postprocess (~30s) |
| Worker health | Help page → Hatchet workers online |
| Wrong org? | Verify org switcher matches Guard project org |
| Internal token | Self-hosted only: internal_service_token matches ingest webhook |
Trace stuck in queued or running
- queued: Ingest accepted but capture never started — check Guard runner logs
- running (hours): Zombie run — cancel source job; admin abort endpoint (future)
- Postprocess backlog: Redis queue depth on Help page
Layer 2 replay returns 409
Cause: Source trace is cold or warm without full blobs.
Fix:
POST /v1/traces/{id}/restore
Wait for eta_seconds, refresh detail until tier=hot, retry replay.
DOM player empty or "0 mutations"
| Cause | Fix |
|---|---|
| Postprocess incomplete | Wait; check rrweb-events.ndjson.zst in Artifacts |
| Playwright-only ingest | Re-run with native @molar/trace-capture |
Privacy block selector | Review scenario privacy — page may be fully blocked |
| Cold tier | Restore to hot |
Debugger returns budget exceeded
Org monthly Debugger cap reached (Settings → Debugger → Budget).
- Enable cheap mode (Haiku-class model)
- Raise cap (Admin)
- BYO Anthropic key on Team+ plans
- Wait for monthly reset
429 on chat endpoint includes budget details.
Blob fetch 403
Caller org does not own a trace referencing that SHA-256. Blobs are global dedup — authorization is always via trace ownership, never hash alone.
API 400 on list filters
Invalid status, tier, or kind values return structured validation errors. Use documented enums only — see API reference.
Example: status=flaky → 400 (use failed).
Share link 404 or expired
- Link past
share_expires_at - Trace deleted or archived
- Visibility flipped to
org-only(public URL revoked) - Typo in
shortId
Create a new share from trace detail.
rrweb iframe blank (CSP)
Parent page Content-Security-Policy may block iframe scripts. Trace embed /r/{shortId}/embed ships compatible CSP headers — host CSP may override on custom domains.
Clones panel shows "No state"
- Scenario did not register Clones for this run
- Capture ran before Clone admin endpoints were ready
- Step boundary before first Clone interaction
Verify Clones integration in scenario .molar.md and Integrations.
AGENT ribbon empty
Run was not Cartographer-sourced — plain Playwright tests have no agent.thought events. Expected behavior.
Presigned URL expired
signed_urls TTL default 3600s. Refetch GET /v1/traces/{id} to renew before downloading large artifacts.
Standalone vs production behavior differs
Standalone Trace uses a local JSON store and simplified workers. Molar Cloud adds SSO, org isolation, Hatchet postprocess, and multi-region storage. See Quick start.
Performance
| Symptom | Tip |
|---|---|
| Slow detail load | Large trace — check total_size_bytes; warm tier may omit video |
| Scrubber lag | >50k events — filter Events tab by kind |
| SSE disconnects | Network proxy timeout — client reconnects with banner |
Performance budgets are enforced in CI for the standalone Trace package.
Getting help
| Channel | Use for |
|---|---|
Help page (/help) | Worker status, integration health |
| support@molar.it | Account, billing, security reports |
| Support | What to include in a ticket |
Include: trace short_id, org slug, approximate created_at, and whether Guard or manual ingest.
Error code index
Platform-wide HTTP and quota errors: Error codes.
Trace-specific:
| Code | Trace context |
|---|---|
trace_not_hot | Layer 2 on cold/warm trace |
debugger_budget_exceeded | Monthly LLM cap |
blob_not_in_manifest | SHA not referenced by trace |
share_expired | Public link past TTL |
Related docs
- Security & encryption —
TRACE_ENCRYPTION_KEY, share risk - Workers & ingestion — pipeline delays
- Debugger & replay — Layer 2 requirements
- API reference — status codes and params