Core Concepts

Cartographer turns a seed URL and natural-language goals into verified browser trajectories and production-grade Playwright tests. This page explains the moving parts — how discovery, execution, verification, and export fit together.

Dashboard: app.molar.it/dashboard/cartographer · Product: cartographer.molar.it

End-to-end flow

  SEED URL
      │
      ▼
  ┌─────────────┐     ┌──────────────┐     ┌─────────────┐
  │   CRAWL     │────►│  ROUTE MAP   │────►│ DISCOVER    │
  │  (discover) │     │  (pages +    │     │ FLOWS       │
  └─────────────┘     │   edges)     │     └──────┬──────┘
                      └──────────────┘            │
                                                  ▼
  ┌─────────────┐     ┌──────────────┐     ┌─────────────┐
  │  PLAYWRIGHT │◄────│   EXPORT     │◄────│ AGENT RUN   │
  │  PROJECT    │     │  PIPELINE    │     │ (goal-driven)│
  └─────────────┘     └──────────────┘     └──────┬──────┘
                                                   │
                      ┌──────────────┐             │
                      │   TRACE +    │◄────────────┘
                      │  FINDINGS    │
                      └──────────────┘

Stage	Input	Output
Crawl	Seed URL, depth, scope	Route map
Agent run	Goal, base URL, credentials	Trajectory (ordered steps)
Verification	Per-step assertions	Pass/fail + self-heal or HITL
Export	Completed trajectory	Linted `.spec.ts` + page objects
Trace	Every run	Replay, network, Debugger chat

Projects and seed URLs

A project is the unit of organization for one application under test.

Concept	Description
Seed URL	Starting point for crawls and default `base_url` for runs
Multiple seed URLs	Supported — project hub lets you pick the active base URL
Environment tag	`staging`, `production`, or custom — labels runs and exports
Scope allowlist	URL prefix guard; crawls and runs refuse out-of-scope navigation
Default grounding tier	Project-wide default for new runs (0–2)
Max grounding tier	Hard ceiling — use `0` for strict privacy (no cloud vision)

Create projects in the dashboard or with cartographer_create_project via MCP.

Crawl and route map

A crawl discovers what exists on your site without pursuing a specific user goal.

Static discovery

The crawl worker performs breadth-first exploration from the seed URL:

Follows same-origin links within scope
Records HTTP status, DOM hash, URL patterns, depth
Captures screenshots and HAR per page when configured
Respects robots.txt unless overridden in project settings

Interactive discovery

For single-page apps (SPAs), Cartographer optionally performs vision-assisted clicks to reveal hidden routes:

Setting	Default	Purpose
`interactive_crawl`	on	Enable click discovery
`interactive_grounding_tier`	1	Vision model for ambiguous controls
`interactive_max_clicks_per_page`	bounded	Retry budget per page

Interactive discovery is why a crawl can find routes that never appear in static HTML alone.

Route map artifacts

Each completed crawl produces a route map:

Artifact	Use
Page table	URL, pattern, depth, status, DOM hash
Route graph	Nodes = URL patterns, edges = discovered links
Screenshots	Visual reference per page
HAR	Network waterfall for debugging
Markdown preview	Readable page summary
Accessibility tree	Structured DOM for grounding

From the route map you can start a run from any page, compare two crawls (added/removed/changed patterns), or Discover flows to batch-enqueue agent runs.

Note:

A crawl answers "what pages exist?" An agent run answers "can a user complete this goal?"

Agent runs

An agent run (labeled Agent runs in the dashboard — not "Runs", which collides with Guard checks) executes a goal in a real browser session.

Planner → actor → verifier loop

Cartographer uses a LangGraph state machine:

  ┌──────────┐    thought + plan    ┌──────────┐
  │ PLANNER  │─────────────────────►│  ACTOR   │
  │ (text)   │                      │ (browser)│
  └────▲─────┘                      └────┬─────┘
       │                                 │
       │         ┌──────────┐            │
       └─────────│ VERIFIER │◄───────────┘
                 │ (assert) │
                 └──────────┘

Role	Responsibility
Planner	Reads goal, route context, prior steps; proposes next action
Actor	Grounds the action to a selector; executes click, type, navigate, etc.
Verifier	Confirms the step succeeded; triggers retry or escalation on failure

Run parameters that matter

Parameter	Effect
`goal`	Natural-language objective
`max_steps`	Hard stop to prevent runaway loops
`grounding_tier`	Starting tier for element location
`max_grounding_tier`	Escalation ceiling
`browser_adapter`	Where the browser runs (see below)
`credentials_alias`	Encrypted login for authenticated flows
`human_preset`	Timing behavior (`default` vs `careful`)
`headless`	Visible browser when `false`
`record_demo`	Capture a replayable demonstration
`anti_bot`	Stealth browser for bot-protected sites

Browser adapters

Adapter	Where it runs	When to use
cloakbrowser	Server-side stealth Chromium pool	Default; bot-protected staging
camoufox	Server-side Firefox variant	Alternate fingerprint

Run statuses

Status	Meaning
`queued`	Waiting for agent worker
`running`	Actively executing steps
`pending_human`	Needs approval or manual step (CAPTCHA, MFA, payment)
`passed` / `succeeded`	Goal completed with verified steps
`failed` / `error`	Unrecoverable failure
`cancelled`	User or API cancelled

Live updates stream over WebSocket (/v1/runs/{id}/stream): step_started, step_completed, tier_escalation, finding, artifact_ready, completed.

Grounding ladder (Tiers 0–2)

Grounding is how Cartographer decides which element to click or type into. Cartographer prefers cheap, deterministic methods first and escalates only when necessary.

  Tier 0 ──fail──► Tier 1 ──fail──► Tier 2
  (a11y tree)      (cloud VLM)      (local UI-TARS)

Tier 0 — Accessibility tree (default)

Property	Detail
Input	Accessibility tree + Set-of-Marks overlay
Model	Text LLM (e.g. Claude Sonnet)
Cost	Lowest
Privacy	No screenshots leave your deployment if tier capped at 0
Best for	Buttons, links, form fields with accessible names

Design goal: Most steps on typical SaaS apps resolve at Tier 0. Live distribution varies by site; check per-run tier analytics in the dashboard.

Tier 1 — Cloud vision (VLM)

Property	Detail
Input	Screenshot + bounding-box candidates
Model	Gemini Flash (configurable)
Trigger	Icon-only buttons, canvas, ambiguous widgets, crawl interactive clicks
Requires	`GOOGLE_API_KEY` or hosted vision routing

Tier 1 emits a tier_escalation event with a human-readable reason.

Tier 2 — Local vision (optional)

Property	Detail
Input	Screenshot to on-prem MLX server
Model	UI-TARS (local)
Trigger	Explicit `grounding_tier=2` or `MAX_GROUNDING_TIER=2`
Requires	`GROUNDING_T2_SERVER_URL` — `make grounding-serve` locally
Use case	Air-gapped deploys, no cloud screenshot policy

Note:

Set max_grounding_tier=0 on a project to enforce privacy mode — no screenshots sent to cloud vision providers. Interactive crawl and Tier 1 escalation are disabled.

Per-run tier analytics

Run detail shows:

Pie/bar chart of Tier 0 / 1 / 2 step distribution
Table of escalation events with reasons
Estimated LLM cost per run

Configure defaults in Settings → AI & models and per-project caps in Project settings → Grounding & browser.

Verification

Every actor step passes through verification before the planner advances.

Mechanism	Description
Post-action assertion	URL change, element visibility, text content
Self-heal	Retry with alternate selectors or tier escalation
Failure taxonomy	Structured error codes for export and Trace
HITL	Pause for human when automation cannot proceed

When verification fails after retries, the run moves to failed with a step-level error message, screenshot evidence, and a linked Trace for debugging.

Export pipeline

Export transforms a completed trajectory into a Playwright test project your team can run in CI.

Seven stages

  prune → rank selectors → cluster POMs → harvest assertions
       → generate (LLM) → lint → verify + heal

Stage	Purpose
Pruner	Remove noisy or redundant steps
Selector ranker	Prefer stable locators (`getByRole`, `getByTestId`)
POM clusterer	Group selectors into page objects
Assertion harvester	Collect verifiable oracles from the trajectory
Generator	Write `.spec.ts` and page-object files
Linter	Enforce Cartographer style rules (no `elementHandle`, no `waitForTimeout`)
Verifier + healer	Run generated tests in sandbox; Gemini repairs failures — sandbox E2E in production is still maturing

Export outputs

Output	Description
`*.spec.ts`	Runnable Playwright tests
Page objects	Clustered selectors per route pattern
`playwright.config.ts`	Stock Chromium by default
Zip archive	Full project download

Export options

Option	Meaning
`style`	`poms` (page-object model) or `flat`
`target_repo`	GitHub repo for PR creation
`uses_cloakbrowser_in_ci`	Opt into stealth browser in CI — off by default

Edit & regenerate: provide a natural-language hint after a failed lint/verify; the healer incorporates your guidance.

Note:

Exported tests are designed for your CI — they do not require Cartographer at runtime unless you opt into CloakBrowser in CI for bot-protected targets.

Demos (demonstration learning)

Demos are user-recorded sessions that teach the planner how to handle tricky flows. Record mode is available via the unpacked browser extension.

Concept	Description
Record mode	Browser extension captures rrweb events, intents, and screenshots
Chunks	Segmented intents with URL patterns and replay strategy
Replay	`POST /v1/projects/{id}/demos/{id}/replay` creates a new agent run

Record once for MFA, payment, or multi-step wizards; Cartographer reuses the skill on similar goals without re-recording.

See Browser extension for install and consent flows.

Credentials

Authenticated flows use credential aliases — never inline secrets in goals or MCP calls.

Property	Behavior
Storage	Encrypted server-side (Fernet + vault backend in production)
Dashboard	Username visible; password never shown after save
Runs	Select alias by name; API receives alias reference only
TOTP seed	Optional — for time-based OTP automation
Extension	Password fields masked; payment/password actions require approval

Create and manage aliases in Settings → Credentials. Pair with the extension for flows that need a human in the loop.

Traces

Cartographer ships a Trace subsystem in-repo (/traces viewer, /v1/traces API, S3-backed NDJSON). Traces are created when capture/ingest runs — they are not automatically linked from every agent run today unless your pipeline ingests or post-processes artifacts.

Feature	Status
Trace list (`/traces`)	Shipped
Step timeline + Events log	Shipped
Video playback	Shipped when `video.webm` exists
Debugger chat	Shipped (requires Anthropic key / hosted routing)
Share links (`/r/[shortId]`, embed)	Shipped
Layer 2 replay API	Shipped (`POST /v1/traces/{id}/replay`)
Layer 2 diff UI	Available when a child replay exists
DOM replay (rrweb)	Available on trace detail at `app.molar.it/trace`

Open traces from /traces in the web app or Open trace on dashboard run detail. List and fetch via MCP: cartographer_list_traces, cartographer_get_trace.

For cross-product debugging, see Trace.

Passive UX findings

During crawls and runs, Cartographer can emit findings without blocking the main goal:

Source	Detects
axe	Accessibility violations
odiff	Visual regression vs baseline
vlm	Sampled UX issues from screenshots

Findings appear in the findings inbox, grouped by severity. Dismiss noise, create visual baselines, or promote findings to export assertions.

Fetch via MCP: cartographer_get_findings.

Discover flows

Discover flows analyzes the latest complete route map and proposes batch agent runs — useful after the first crawl of a large app.

Project hub → Discover flows
Confirmation shows N proposed flows
Progress modal enqueues runs
Monitor in Agent runs list

This bridges discovery (crawl) and execution (runs) without manual goal entry per route.

How concepts map to the dashboard

Concept	Dashboard location
Project / seed URL	`/dashboard/projects`, project overview
Route map	`/dashboard/projects/{id}/explore`
Agent run	`/dashboard/projects/{id}/runs`, `/runs` (legacy list)
Trace	`/traces` (separate TS viewer — not inside `/dashboard`)
Export	`/dashboard/projects/{id}/exports`
Demo / recording	`/dashboard/projects/{id}/recordings`
Credentials	`/dashboard/projects/{id}/credentials`
Findings	`/dashboard/projects/{id}/findings`

Full information architecture: Dashboard.

Page	Topic
Quick start	Bootstrap and first project
Configuration	Model routing and env vars
MCP tools	IDE API surface
Troubleshooting	Stuck crawls, tier escalation, export lint

Quick start Dashboard

Core Concepts

Core Concepts

End-to-end flow

Projects and seed URLs

Crawl and route map

Static discovery

Interactive discovery

Route map artifacts

Agent runs

Planner → actor → verifier loop

Run parameters that matter

Browser adapters

Run statuses

Grounding ladder (Tiers 0–2)

Tier 0 — Accessibility tree (default)

Tier 1 — Cloud vision (VLM)

Tier 2 — Local vision (optional)

Per-run tier analytics

Verification

Export pipeline

Seven stages

Export outputs

Export options

Demos (demonstration learning)

Credentials

Traces

Passive UX findings

Discover flows

How concepts map to the dashboard

Related