Skip to content

🥾 Trailblaze

Natural-language device control for your coding agent — across iOS, Android, and web. Every session is a replayable trail you can run as a test.

Trailblaze gives your coding agent a single, typed, replayable way to drive any device. Built-in primitives plus your own typed tools, with a natural-language source of truth that travels across platforms. The artifact your agent leaves behind — a portable .trail.yaml — is both what the flow does (prose your team reads) and how it runs (recorded steps your CI replays deterministically with no LLM at replay time).

Trailblaze is not its own coding agent. Claude Code, Cursor, Codex, Goose, Aider — your editor’s agent — does the planning. Trailblaze handles the device. (If you’d rather run the included built-in agent end-to-end, that’s available too — it’s the same agent that powers --self-heal and the CI fallback path, just not the headline integration anymore.)

See a real run

Every trail produces a rich, replayable report. This one is generated by CI straight from the examples/ trails in this repo — no mockup. Click through for the full interactive report, or browse the Report Gallery for more.

Animated timeline of a recorded Trailblaze run against live Wikipedia

Quickstart

Install Trailblaze first (Getting Started walks through it), then:

trailblaze device list

# Pin this terminal to a device — subsequent calls inherit it.
# Trailblaze remembers per-terminal, so other terminals stay independent.
trailblaze device connect android

# Read the screen — returns a UI tree with refs (e.g. ab42) your agent can target
trailblaze snapshot

# Act on a referenced element. Every action takes --step so self-heal can recover.
trailblaze tool tap ref=ab42 -s "Tap sign in"

Paste those into a Claude Code, Codex, Cursor, or Goose session and your agent is already authoring trails. (Every device-acting command — snapshot, tool, blaze, ask, verify, session start/stop, run — also accepts -d <platform> as a per-call override, and --target <app> where supported (tool, blaze, session start, mcp) — useful in CI / scripts that prefer determinism over shell state.) A longer walkthrough lives in Getting Started.

How Trailblaze grows with you

Three rungs. You can stop at any of them.

  1. Drive a device. Point your coding agent at the trailblaze CLI. Natural-language device control across iOS, Android, and web — through built-in primitives (snapshot, tool, toolbox) plus any custom tools your team has shipped.
  2. Save and replay. Any session becomes a .trail.yaml via trailblaze session save. Replay ad-hoc with trailblaze run, commit it to your repo as a CI regression test, or open it in the Trace Viewer — same artifact, three uses, no LLM at replay.
  3. Compose your own agent surface. Give your agent first-class commands like login or addToCart, named waypoints for your screens, and trailmaps shared across teams. Curate exactly what your agent sees: surface your login, hide the low-level taps, pick four of twenty primitives if that’s what your tests need. Custom commands are typed and replayable; every call — yours, the built-ins, or third-party — is a first-class command.

Native fidelity on every platform

Trailblaze does not flatten platforms into a single lowest-common-denominator abstraction. Each driver speaks its host platform’s native vocabulary:

Platform Driver Hierarchy
Android UiAutomator / Compose / on-device instrumentation Button, EditText, RecyclerView, Switch
iOS Native Accessibility / XCUITest UIButton, UITextField, UITableView
Web Playwright ARIA roles, full DOM, network, console

The agent picks elements semantically — “the Sign in button” — from the native hierarchy. Trailblaze computes the platform-specific selector behind the scenes. The natural-language test stays the same; the execution uses each platform’s full power.

This only works because an agent is driving. Exposing twenty platform-specific selector strategies per element to a human is no one’s idea of a good testing SDK. Exposing it to an LLM is the point.

Trace Viewer

Every run — driven by you, your coding agent, the CLI, or CI — produces a rich session you can inspect: per-step screenshots, recorded tool calls, view-hierarchy snapshots, the full LLM transcript (when an LLM was involved), and video replay when capture is on.

Same viewer surface, three ways:

  • Desktop apptrailblaze app opens the Sessions list across every device and run, with live updates while a session is running, one-click “show me the trail YAML” to copy back into your project, and inline trail editing.
  • Inline on every CI build — share a URL, open in a browser, no Trailblaze install required.
  • On disk under ~/.trailblaze/logs/<sessionId>/ if you ever need to grep raw artifacts.

When you want a different selector than the one Trailblaze auto-picked for a step, the viewer lets you choose from generated alternatives computed against the same captured hierarchy — human judgment, no re-recording. Same viewer for iOS, Android, and web.

Self-heal

Recorded trails replay deterministically by default — no LLM in the loop, no flake. When a recorded step genuinely doesn’t match the screen anymore, there are two repair paths:

  • Built-in self-heal handles small drift — text changes, an unexpected popup, a minor reorder. Opt in with --self-heal and Trailblaze’s built-in agent patches the failing step against the live screen and updates the recording on success.
  • Your coding agent handles the larger cases — anything that needs project context, log inspection, or judgment about intent. The trace session is the diagnosis surface; Claude Code, Cursor, or Codex read the trace, compare what the step intended (its natural-language step text) to what the app now does, and propose a fix.

The natural-language step text is what makes this work. It captures what the step was trying to do, so repair is a matter of updating the how against the current app — not re-deriving intent from a broken selector. Default is fail-loud; self-heal is opt-in so real flakes don’t get silently masked.

Core Capabilities

  • CLI any agent can drivesnapshot to read the screen, tool to act, run to replay a .trail.yaml, session save to persist a recording. Every capability is a shell subcommand; your coding agent invokes them the same way you do.
  • --step on every tool call — capture why alongside what. When the UI drifts, recorded trails self-heal against the recorded step text instead of breaking on a brittle selector. (--objective / -o remain accepted as deprecated aliases.)
  • Self-heal — built-in for small drift, your coding agent for larger cases via the trace.
  • Trails — drop a .trail.yaml anywhere in your project. No trails/ directory required. Run by path or shell glob; auto-discovered.
  • Trace Viewer — every run produces a rich session: per-step screenshots, hierarchies, recorded tool calls, LLM transcripts, video replay. CI exposes it inline; the desktop app shows the same UI for local sessions.
  • External config bundles — layer app targets, YAML toolsets, and TypeScript scripted tools on top of the binary without rebuilding Trailblaze.
  • Multi-device CLI sessions — drive Android + iOS + web from the same shell, in parallel, each with its own bound device.

Active Prototypes

Trailblaze is moving fast. These are landing now and are worth knowing about even if they’re not stable yet.

Trailmaps

A trailmap is a reusable bundle of target-aware capabilities — tools, waypoints, navigation routes, and recorded trails — that an app team publishes once and that both human authors and live agents consume. Think of it as the Robot Pattern, generalized and shippable: not just a bag of helper methods inside one test suite, but a published library plus a navigation model plus runnable proof that the model still works.

See: the Trailmaps guide for the manifest schema, per-file scripted tools, and the workspace-vs-classpath precedence rule. Background: npm Distribution for Trailmaps, Target Trailmaps: Local-First Packaging, Trailblaze as the Robot Pattern — and More.

Scripted Tools (TypeScript)

Start here: Your First Trailmap walks one tool from an empty directory to a passing run. Once you’re past the first run, the per-tool Scripted Tools (TypeScript) reference covers the authoring details.

What scripted tools give you: custom tools, written in TypeScript, that drop into a trailmap with no Kotlin code, no Gradle build, and no per-tool YAML descriptor. Declare inputs as a TypeScript interface, write the handler against the typed ctx.tools.<name>(args) composition surface, and the framework derives the schema, the LLM-facing description, and the IDE bindings from the .ts file itself. Tools execute in a QuickJS sandbox on-device by default, or in a host subprocess when they need Node-compatible APIs.

Two worked target trailmaps live in the OSS tree as full-shape references to copy — examples/ios-contacts (iOS, host driver) and examples/wikipedia (web, Playwright Native). Each ships ~9 scripted tools, a target-scoped system prompt, and ~20 trails exercising them.

The older export async function + full-YAML-descriptor authoring shape stays documented as a Legacy Reference; existing legacy tools keep working unmodified — new authoring should use the typed shape. Background: @trailblaze/scripting Authoring Vision.

Waypoints

A waypoint is a named, assertable location in the app — defined structurally (element identity, stable labels), never by content. Waypoints power the agent’s mental map of an app: it can ask “am I on the Inbox?”, land on a waypoint after a step, or use waypoints as checkpoints for trails. The matchWaypoint tool runs against captured session state and returns clean matches plus near-misses (off by one assertion), so authors iterate without staged pipelines.

See: Waypoints and App Navigation Graphs, Waypoint Discovery via matchWaypoint.

Trail-as-Tool

A trail can itself be exposed as a tool, so an agent (or a higher-level trail) can call it like any other capability. This makes flows composable: a loginAsTestUser trail becomes a one-line setup step inside any other test.

See: runTrail Trail-as-Tool Primitive.

Built for an evolving ecosystem

The AI agent ecosystem is moving fast. Whatever it looks like in a year — or five, or ten — your natural-language trails will come with you. Trailblaze captures what you’re testing as portable prose; the how (selectors, recordings, agent harness, framework version) adapts as the landscape changes.

Built-in Agent (Fallback)

Trailblaze ships a built-in agent — trailblaze step, plus the vision primitives trailblaze ask and trailblaze verify — for cases where you don’t have a coding agent in the loop. It’s the same agent that powers --self-heal and the CI fallback path. (trailblaze blaze remains accepted as a deprecated alias of trailblaze step.)

These commands appear under Built-in agent: at the bottom of trailblaze --help, below the recommended deterministic primitives. They require an LLM:

trailblaze config llm anthropic/claude-sonnet-4-20250514

The built-in agent implements features from the Mobile-Agent-v3 research line: exception handling for popups and stuck states, reflection and self-correction, task decomposition, cross-app memory, and enhanced recording for robust replay. See Architecture / Multi-Agent V3 for the details.

For serious authoring work, you want a real coding agent (Claude Code, Cursor, Codex) driving the Trailblaze primitives instead — those bring your codebase, log inspection, and project context to the loop, which the built-in agent can’t.

Where to Go Next

License

Trailblaze is licensed under the Apache License 2.0.