Workers & ingestion

How traces are captured, post-processed, tiered, and ingested — capture library, Hatchet workers, CI upload, and live streaming.

Workers & ingestion

Note:

Self-hosted operators only. Hosted customers at app.molar.it do not need to run capture workers or Hatchet postprocess jobs. This page documents the internal pipeline for on-prem Trace deployments.

Trace artifacts flow from the Playwright process → object store → postprocess workers → dashboard. Understanding this pipeline helps you debug missing traces, slow finalization, and CI upload failures.

Pipeline overview

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│ Playwright run  │────▶│ Live ship (zstd-3)│────▶│ S3 / R2 prefix  │
│ + trace-capture │     │ append NDJSON     │     │ traces/runs/…  │
└─────────────────┘     └──────────────────┘     └────────┬────────┘
                        ┌──────────────────┐              │
                        │ trace-postprocess │◀─────────────┘
                        │ (Hatchet)         │
                        └────────┬─────────┘
          ┌──────────────────────┼──────────────────────┐
          ▼                      ▼                      ▼
   summary.json           rrweb extract           zstd-19 finalize
   search_vector           manifest.json           tier tag = hot
          │                      │                      │
          └──────────────────────┴──────────────────────┘
                        ┌────────▼─────────┐
                        │ Dashboard / API │
                        │ GET /v1/traces  │
                        └─────────────────┘

Capture (@molar/trace-capture)

Runs inside the Node/Playwright process — cannot live in Python or Go.

ModuleResponsibility
writerAppend NDJSON lines with ts, seq, kind, step_id
rrweb-injectaddInitScript recorder
playwright-hookstracing.start, network, console, video
clone-capturePer-clone read endpoints + world snapshot at steps
live-shipStream compressed chunks during run
finalizeSeal run, enqueue postprocess

Instrumentation sources

SourceAPIOn-disk
Playwright tracecontext.tracing.start({snapshots:true})trace.playwright.zip
DOMrrweb v2dom.mutation, dom.snapshot
Networkpage.on('request'/'response')network.* + blob refs
Consolepage.on('console'/'pageerror')console.log, console.error
VideorecordVideovideo.webm
ScreenshotsStep boundary + 1s heartbeatcontent-addressed PNG/WebP
Clone statePer-clone GET /_clone/{kind}/…/{run}clone.state
AgentCartographer LLM traceagent.thought, agent.action
ClockGET /_clone/clockclock.advance
ChaosClone middlewareerror_injection, latency_injection

Compression strategy

PhaseAlgorithmWhy
Live streamzstd level 3Speed during capture
Postprocesszstd level 19~50% smaller than gzip for archival

Already-compressed binaries (PNG, WebM, AV1) skip recompression.

Content-addressed blobs

Large payloads store at blobs/sha256/{first2}/{rest} with only sha256:… in NDJSON. Identical login-page DOM across 10,000 runs stores once — dedup is the primary cost lever.

Object store layout

s3://{bucket}/
  traces/runs/{org_id}/{yyyy}/{mm}/{dd}/{trace_id}/
    trace.ndjson.zst
    trace.playwright.zip
    video.webm
    summary.json
    manifest.json
    rrweb-events.ndjson.zst      # built by postprocess
  blobs/sha256/{aa}/{bb…}        # global dedup — auth at API layer

Compatible: AWS S3, Cloudflare R2, SeaweedFS. Not supported: MinIO (archived upstream).

Signed URLs minted only after org-scoped authorization — never public bucket ACLs.

Postprocess worker (trace-postprocess)

Hatchet workflow triggered on run.end:

  1. Recompress NDJSON at zstd-19
  2. Build summary.json (list view + search index)
  3. Build manifest.json (blob refs for GC)
  4. Extract rrweb-events.ndjson.zst for DOM player
  5. Update Postgres traces row: size, search_vector, status, failure_signature
  6. Auto-pin if status=failed per org policy
  7. Publish completion event on Redis channel

Typical latency: 5–30s for a 30s PR run.

Tier-down worker (trace-tier-down)

Lifecycle transitions:

FromToTrigger
hotwarmAge > hot_retention_days (7 default Free, 30 Team)
warmcoldAge > warm_retention_days
coldDelete blobs except summary + final screenshot

Pinned traces skip all transitions. Failures default pinned until admin archive.

Restore: POST /v1/traces/{id}/restore — cold → hot async (~30–90s Glacier Instant Retrieval).

Layer 2 worker (trace-layer2)

See Debugger & replay. Python Hatchet workflow shells out to @molar/trace-replay-runner (Node) for Playwright execution.

GC worker (trace-gc)

Reference-counts blobs in trace_blobs; deletes unreferenced SHA-256 objects after grace period. Admin Help page shows GC metrics.

Ingest paths

Automatic (Cartographer runner)

No user action when @molar/trace-capture is attached — capture starts with the test process, live-ships during run, postprocess on completion.

Internal service auth: internal_service_token header on ingest webhook (self-hosted operators only).

Manual dashboard upload

/ingest page — multipart bundle:

POST /v1/traces/ingest
Content-Type: multipart/form-data

bundle: <trace-bundle.tar.zst>
scenario_slug: checkout-stripe
source: manual

Response:

{ "trace_id": "550e8400-e29b-41d4-a716-446655440000", "short_id": "xY9zQ2mNp" }

CI API key

curl -X POST https://api.molar.it/v1/trace/traces/ingest \
  -H "Authorization: Bearer molar_sk_…" \
  -H "x-molar-org: org_acme" \
  -F "bundle=@artifacts/trace-bundle.tar.zst" \
  -F "scenario_slug=checkout-stripe" \
  -F "commit_sha=abc123" \
  -F "pr_number=4521"

Required scope: traces:write on API key.

Playwright zip only (limited)

Upload trace.playwright.zip via ingest — postprocess synthesizes minimal NDJSON for list/detail. Full five-ribbon experience requires native capture (network, Clones, agent).

Live streaming

During capture:

GET /v1/traces/{id}/stream
Accept: text/event-stream

Events mirror NDJSON kinds. Guard internal ingest uses ?live=1 for same-run dashboard subscription.

Redis channel: cartographer:run:{run_id}:events (shared event bus pattern).

Standalone vs production stack

ComponentStandalone bundleMolar Cloud
APINode HTTP serverFastAPI at api.molar.it
WorkersInline / simplifiedManaged Hatchet workers
DBLocal JSON storePostgres with org isolation
Object storeLocal filesystemS3-compatible cloud storage
ViewerBundled static UINext.js at app.molar.it/trace

Standalone is for build/verify and air-gapped pilots. Molar Cloud adds SSO, RLS, and multi-tenant isolation.

Helm deployment (on-prem)

Helm charts for on-prem Trace are available from Molar support. Contact support@molar.it for:

  • API deployment + ingress
  • Worker deployment (postprocess, layer2, tier-down, gc)
  • TRACE_ENCRYPTION_KEY from Kubernetes secret
  • NetworkPolicy restricting worker egress
  • SeaweedFS as default object store

See values.yaml for retention overrides and region pins.

Health checks

Trace Help page (/help) will show (

ServiceIndicator
APILatency, version
S3Bucket reachability, used GB
RedisQueue depth
HatchetWorkers online
AnthropicDebugger model reachable
GitHub AppInstall scope

GET /health and GET /metrics for Prometheus scraping in self-hosted installs.

Troubleshooting ingest

SymptomLikely cause
Trace stuck queuedPostprocess worker down — check Hatchet
Missing AGENT ribbonRun was not Cartographer-sourced
Missing Clones panelClones not registered for scenario
ingest 413Bundle exceeds plan upload cap
ingest 401API key missing traces:write

More: Troubleshooting & FAQ.