Workers & ingestion
Note:
Self-hosted operators only. Hosted customers at app.molar.it do not need to run capture workers or Hatchet postprocess jobs. This page documents the internal pipeline for on-prem Trace deployments.
Trace artifacts flow from the Playwright process → object store → postprocess workers → dashboard. Understanding this pipeline helps you debug missing traces, slow finalization, and CI upload failures.
Pipeline overview
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Playwright run │────▶│ Live ship (zstd-3)│────▶│ S3 / R2 prefix │
│ + trace-capture │ │ append NDJSON │ │ traces/runs/… │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
│
┌──────────────────┐ │
│ trace-postprocess │◀─────────────┘
│ (Hatchet) │
└────────┬─────────┘
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
summary.json rrweb extract zstd-19 finalize
search_vector manifest.json tier tag = hot
│ │ │
└──────────────────────┴──────────────────────┘
│
┌────────▼─────────┐
│ Dashboard / API │
│ GET /v1/traces │
└─────────────────┘
Capture (@molar/trace-capture)
Runs inside the Node/Playwright process — cannot live in Python or Go.
| Module | Responsibility |
|---|---|
writer | Append NDJSON lines with ts, seq, kind, step_id |
rrweb-inject | addInitScript recorder |
playwright-hooks | tracing.start, network, console, video |
clone-capture | Per-clone read endpoints + world snapshot at steps |
live-ship | Stream compressed chunks during run |
finalize | Seal run, enqueue postprocess |
Instrumentation sources
| Source | API | On-disk |
|---|---|---|
| Playwright trace | context.tracing.start({snapshots:true}) | trace.playwright.zip |
| DOM | rrweb v2 | dom.mutation, dom.snapshot |
| Network | page.on('request'/'response') | network.* + blob refs |
| Console | page.on('console'/'pageerror') | console.log, console.error |
| Video | recordVideo | video.webm |
| Screenshots | Step boundary + 1s heartbeat | content-addressed PNG/WebP |
| Clone state | Per-clone GET /_clone/{kind}/…/{run} | clone.state |
| Agent | Cartographer LLM trace | agent.thought, agent.action |
| Clock | GET /_clone/clock | clock.advance |
| Chaos | Clone middleware | error_injection, latency_injection |
Compression strategy
| Phase | Algorithm | Why |
|---|---|---|
| Live stream | zstd level 3 | Speed during capture |
| Postprocess | zstd level 19 | ~50% smaller than gzip for archival |
Already-compressed binaries (PNG, WebM, AV1) skip recompression.
Content-addressed blobs
Large payloads store at blobs/sha256/{first2}/{rest} with only sha256:… in NDJSON. Identical login-page DOM across 10,000 runs stores once — dedup is the primary cost lever.
Object store layout
s3://{bucket}/
traces/runs/{org_id}/{yyyy}/{mm}/{dd}/{trace_id}/
trace.ndjson.zst
trace.playwright.zip
video.webm
summary.json
manifest.json
rrweb-events.ndjson.zst # built by postprocess
blobs/sha256/{aa}/{bb…} # global dedup — auth at API layer
Compatible: AWS S3, Cloudflare R2, SeaweedFS. Not supported: MinIO (archived upstream).
Signed URLs minted only after org-scoped authorization — never public bucket ACLs.
Postprocess worker (trace-postprocess)
Hatchet workflow triggered on run.end:
- Recompress NDJSON at zstd-19
- Build
summary.json(list view + search index) - Build
manifest.json(blob refs for GC) - Extract
rrweb-events.ndjson.zstfor DOM player - Update Postgres
tracesrow: size,search_vector,status,failure_signature - Auto-pin if
status=failedper org policy - Publish completion event on Redis channel
Typical latency: 5–30s for a 30s PR run.
Tier-down worker (trace-tier-down)
Lifecycle transitions:
| From | To | Trigger |
|---|---|---|
| hot | warm | Age > hot_retention_days (7 default Free, 30 Team) |
| warm | cold | Age > warm_retention_days |
| cold | — | Delete blobs except summary + final screenshot |
Pinned traces skip all transitions. Failures default pinned until admin archive.
Restore: POST /v1/traces/{id}/restore — cold → hot async (~30–90s Glacier Instant Retrieval).
Layer 2 worker (trace-layer2)
See Debugger & replay. Python Hatchet workflow shells out to @molar/trace-replay-runner (Node) for Playwright execution.
GC worker (trace-gc)
Reference-counts blobs in trace_blobs; deletes unreferenced SHA-256 objects after grace period. Admin Help page shows GC metrics.
Ingest paths
Automatic (Cartographer runner)
No user action when @molar/trace-capture is attached — capture starts with the test process, live-ships during run, postprocess on completion.
Internal service auth: internal_service_token header on ingest webhook (self-hosted operators only).
Manual dashboard upload
/ingest page — multipart bundle:
POST /v1/traces/ingest
Content-Type: multipart/form-data
bundle: <trace-bundle.tar.zst>
scenario_slug: checkout-stripe
source: manual
Response:
{ "trace_id": "550e8400-e29b-41d4-a716-446655440000", "short_id": "xY9zQ2mNp" }
CI API key
curl -X POST https://api.molar.it/v1/trace/traces/ingest \
-H "Authorization: Bearer molar_sk_…" \
-H "x-molar-org: org_acme" \
-F "bundle=@artifacts/trace-bundle.tar.zst" \
-F "scenario_slug=checkout-stripe" \
-F "commit_sha=abc123" \
-F "pr_number=4521"
Required scope: traces:write on API key.
Playwright zip only (limited)
Upload trace.playwright.zip via ingest — postprocess synthesizes minimal NDJSON for list/detail. Full five-ribbon experience requires native capture (network, Clones, agent).
Live streaming
During capture:
GET /v1/traces/{id}/stream
Accept: text/event-stream
Events mirror NDJSON kinds. Guard internal ingest uses ?live=1 for same-run dashboard subscription.
Redis channel: cartographer:run:{run_id}:events (shared event bus pattern).
Standalone vs production stack
| Component | Standalone bundle | Molar Cloud |
|---|---|---|
| API | Node HTTP server | FastAPI at api.molar.it |
| Workers | Inline / simplified | Managed Hatchet workers |
| DB | Local JSON store | Postgres with org isolation |
| Object store | Local filesystem | S3-compatible cloud storage |
| Viewer | Bundled static UI | Next.js at app.molar.it/trace |
Standalone is for build/verify and air-gapped pilots. Molar Cloud adds SSO, RLS, and multi-tenant isolation.
Helm deployment (on-prem)
Helm charts for on-prem Trace are available from Molar support. Contact support@molar.it for:
- API deployment + ingress
- Worker deployment (postprocess, layer2, tier-down, gc)
TRACE_ENCRYPTION_KEYfrom Kubernetes secret- NetworkPolicy restricting worker egress
- SeaweedFS as default object store
See values.yaml for retention overrides and region pins.
Health checks
Trace Help page (/help) will show (
| Service | Indicator |
|---|---|
| API | Latency, version |
| S3 | Bucket reachability, used GB |
| Redis | Queue depth |
| Hatchet | Workers online |
| Anthropic | Debugger model reachable |
| GitHub App | Install scope |
GET /health and GET /metrics for Prometheus scraping in self-hosted installs.
Troubleshooting ingest
| Symptom | Likely cause |
|---|---|
Trace stuck queued | Postprocess worker down — check Hatchet |
| Missing AGENT ribbon | Run was not Cartographer-sourced |
| Missing Clones panel | Clones not registered for scenario |
ingest 413 | Bundle exceeds plan upload cap |
ingest 401 | API key missing traces:write |
More: Troubleshooting & FAQ.