Skip to content

🥾 Trailblaze

Natural-language device control for your coding agent — across iOS, Android, and web. Every session is a replayable trail you can run as a test.

Trailblaze gives your coding agent a single, typed, replayable way to drive any device. Built-in primitives plus your own typed tools, with a natural-language source of truth that travels across platforms. The artifact your agent leaves behind — a portable .trail.yaml — is both what the flow does (prose your team reads) and how it runs (recorded steps your CI replays deterministically with no LLM at replay time).

Trailblaze is not its own coding agent. Claude Code, Cursor, Codex, Goose, Aider — your editor’s agent — does the planning. Trailblaze handles the device. (A built-in agent ships in the box for cases where you don’t have a coding agent in the loop — see Built-in Agent (Fallback).)

See a real run

Every trail produces a rich, replayable report. These runs are generated by CI straight from the Android, iOS, and web trails in this repo — no mockups. Click through for the full interactive reports, or browse the Report Gallery for more.

Quickstart

Install Trailblaze first (Getting Started walks through it), then:

trailblaze device list

# Pin this terminal to a device — subsequent calls inherit it.
# Trailblaze remembers per-terminal, so other terminals stay independent.
trailblaze device connect android

# Read the screen — returns a UI tree with refs (e.g. ab42) your agent can target
trailblaze snapshot

# Act on a referenced element. Every action takes --step so self-heal can recover.
trailblaze tool tap ref=ab42 -s "Tap sign in"

Paste those into a Claude Code, Codex, Cursor, or Goose session and your agent is already authoring trails. (Every device-acting command — snapshot, tool, step, ask, verify, session start/stop, run — also accepts -d <platform> as a per-call override, and --target <app> where supported (tool, step, session start, mcp) — useful in CI / scripts that prefer determinism over shell state.) A longer walkthrough lives in Getting Started.

How Trailblaze grows with you

Three rungs. You can stop at any of them.

  1. Drive a device. Point your coding agent at the trailblaze CLI. Natural-language device control across iOS, Android, and web — through built-in primitives (snapshot, tool, toolbox) plus any custom tools your team has shipped.
  2. Save and replay. Any session becomes a .trail.yaml via trailblaze session save. Replay ad-hoc with trailblaze run, commit it to your repo as a CI regression test, or open it in the Trace Viewer — same artifact, three uses, no LLM at replay.
  3. Compose your own agent surface. Give your agent first-class commands like login or addToCart, named waypoints for your screens, and trailmaps shared across teams. Curate exactly what your agent sees: surface your login, hide the low-level taps, pick four of twenty primitives if that’s what your tests need. Custom commands are typed and replayable; every call — yours, the built-ins, or third-party — is a first-class command.

Native fidelity on every platform

Trailblaze does not flatten platforms into a single lowest-common-denominator abstraction. Each driver speaks its host platform’s native vocabulary:

Platform Driver Hierarchy
Android UiAutomator / Compose / on-device instrumentation Button, EditText, RecyclerView, Switch
iOS Native Accessibility / XCUITest UIButton, UITextField, UITableView
Web Playwright ARIA roles, full DOM, network, console

The agent picks elements semantically — “the Sign in button” — from the native hierarchy. Trailblaze computes the platform-specific selector behind the scenes. The natural-language test stays the same; the execution uses each platform’s full power.

This only works because an agent is driving. Exposing twenty platform-specific selector strategies per element to a human is no one’s idea of a good testing SDK. Exposing it to an LLM is the point.

Trace Viewer

Every run — driven by you, your coding agent, the CLI, or CI — produces a rich session you can inspect: per-step screenshots, recorded tool calls, view-hierarchy snapshots, the full LLM transcript (when an LLM was involved), and video replay when capture is on.

Same viewer surface, three ways:

  • Desktop apptrailblaze app opens the Sessions list across every device and run, with live updates while a session is running, one-click “show me the trail YAML” to copy back into your project, and inline trail editing.
  • Inline on every CI build — share a URL, open in a browser, no Trailblaze install required.
  • On disk under ~/.trailblaze/logs/<sessionId>/ if you ever need to grep raw artifacts.

When you want a different selector than the one Trailblaze auto-picked for a step, the viewer lets you choose from generated alternatives computed against the same captured hierarchy — human judgment, no re-recording. Same viewer for iOS, Android, and web.

Self-heal

Recorded trails replay deterministically by default — no LLM in the loop, no flake. When a recorded step genuinely doesn’t match the screen anymore, there are two repair paths:

  • Built-in self-heal handles small drift — text changes, an unexpected popup, a minor reorder. Opt in with --self-heal and Trailblaze’s built-in agent patches the failing step against the live screen and updates the recording on success.
  • Your coding agent handles the larger cases — anything that needs project context, log inspection, or judgment about intent. The trace session is the diagnosis surface; Claude Code, Cursor, or Codex read the trace, compare what the step intended (its natural-language step text) to what the app now does, and propose a fix.

The natural-language step text is what makes this work. It captures what the step was trying to do, so repair is a matter of updating the how against the current app — not re-deriving intent from a broken selector. Default is fail-loud; self-heal is opt-in so real flakes don’t get silently masked.

Core Capabilities

  • CLI any agent can drivesnapshot to read the screen, tool to act, run to replay a .trail.yaml, session save to persist a recording. Every capability is a shell subcommand; your coding agent invokes them the same way you do.
  • --step on every tool call — capture why alongside what. When the UI drifts, recorded trails self-heal against the recorded step text instead of breaking on a brittle selector. (--objective / -o remain accepted as deprecated aliases.)
  • Self-heal — built-in for small drift, your coding agent for larger cases via the trace.
  • Trails — drop a .trail.yaml anywhere in your project. No trails/ directory required. Run by path or shell glob; auto-discovered.
  • Trace Viewer — every run produces a rich session: per-step screenshots, hierarchies, recorded tool calls, LLM transcripts, video replay. CI exposes it inline; the desktop app shows the same UI for local sessions.
  • External config bundles — layer app targets, YAML toolsets, and TypeScript scripted tools on top of the binary without rebuilding Trailblaze.
  • Multi-device CLI sessions — drive Android + iOS + web from the same shell, in parallel, each with its own bound device.

Trailmaps

A trailmap is everything one app needs to be testable, in one self-contained directory: the custom tools your agent uses to drive the app (login, addToCart), the framework toolsets you want exposed (core_interaction, verification), named locations in the app, and the recorded trails that exercise it. An app team publishes the trailmap once; other teams (and your CI, and live agents) consume it. Concretely it’s a folder with one trailmap.yaml manifest plus a few .ts files for the custom tools — see Your First Trailmap for the full walkthrough, and Publishing a Trailmap for the distribution tiers (local workspace, GitHub repo, npm package).

If you’ve used Page Object (web testing) or Robot Pattern (Android) — reusable helper layers that wrap raw UI primitives into business-level actions like login and addToCart, so individual tests read like “log in, add to cart, check out” instead of “tap 4, type ‘foo’, tap 12” — a trailmap is the same instinct made shippable: a published library plus a navigation model plus runnable proof, rather than a bag of helper methods buried inside one test suite. (If you haven’t used either, you can ignore the comparison — the description above is self-contained.)

See the Trailmaps guide for the manifest schema and the workspace-vs-classpath precedence rule. Background: npm Distribution for Trailmaps, Target Trailmaps: Local-First Packaging.

Scripted Tools (TypeScript)

Start here: Your First Trailmap walks one tool from an empty directory to a passing run. Once you’re past the first run, the per-tool Scripted Tools (TypeScript) reference covers the authoring details.

What scripted tools give you: custom tools, written in TypeScript, that drop into a trailmap with no Kotlin code, no Gradle build, and no per-tool YAML descriptor. Declare inputs as a TypeScript interface, write the handler against the typed ctx.tools.<name>(args) composition surface, and the framework derives the schema, the LLM-facing description, and the IDE bindings from the .ts file itself. Tools execute in an embedded JavaScript sandbox shipped inside Trailblaze (no separate Node install, no node_modules); tools that need full Node APIs opt into a Bun subprocess with one flag.

Two worked target trailmaps live in the OSS tree as full-shape references to copy — examples/ios-contacts (iOS, host driver) and examples/wikipedia (web, Playwright Native). Each ships ~9 scripted tools, a target-scoped system prompt, and ~20 trails exercising them.

The older export async function + full-YAML-descriptor authoring shape stays documented as a Legacy Reference; existing legacy tools keep working unmodified — new authoring should use the typed shape. Background: @trailblaze/scripting Authoring Vision.

Active Prototype: Waypoints

Trailblaze is moving fast. Waypoints are landing now and worth knowing about even though they’re not yet stable: named, assertable locations in the app, defined structurally (element identity, stable labels), never by content. Waypoints power the agent’s mental map of an app: it can ask “am I on the Inbox?”, land on a waypoint after a step, or use waypoints as trail checkpoints. The matchWaypoint tool runs against captured session state and returns clean matches plus near-misses (off by one assertion), so authors iterate without staged pipelines.

See: Waypoints and App Navigation Graphs, Waypoint Discovery via matchWaypoint.

Built for an evolving ecosystem

The AI agent ecosystem is moving fast. Whatever it looks like in a year — or five, or ten — your natural-language trails will come with you. Trailblaze captures what you’re testing as portable prose; the how (selectors, recordings, agent harness, framework version) adapts as the landscape changes.

Built-in Agent (Fallback)

Trailblaze ships a built-in agent — trailblaze step, plus the vision primitives trailblaze ask and trailblaze verify — for cases where you don’t have a coding agent in the loop. It’s the same agent that powers --self-heal and the CI fallback path. (trailblaze blaze remains accepted as a deprecated alias of trailblaze step.)

These commands appear under Built-in agent: at the bottom of trailblaze --help, below the recommended deterministic primitives. They require an LLM:

trailblaze config llm anthropic/claude-sonnet-4-20250514

The built-in agent implements features from the Mobile-Agent-v3 research line: exception handling for popups and stuck states, reflection and self-correction, task decomposition, cross-app memory, and enhanced recording for robust replay.

For serious authoring work, you want a real coding agent (Claude Code, Cursor, Codex) driving the Trailblaze primitives instead — those bring your codebase, log inspection, and project context to the loop, which the built-in agent can’t.

Where to Go Next

License

Trailblaze is licensed under the Apache License 2.0.