🥾 Trailblaze¶
Natural-language device control for your coding agent — across iOS, Android, and web. Every session is a replayable trail you can run as a test.
Trailblaze gives your coding agent a single,
typed, replayable way to drive any device. Built-in primitives plus your own typed
tools, with a natural-language source of truth that travels across platforms. The
artifact your agent leaves behind — a portable .trail.yaml — is both what the flow
does (prose your team reads) and how it runs (recorded steps your CI replays
deterministically with no LLM at replay time).
Trailblaze is not its own coding agent. Claude Code, Cursor, Codex, Goose, Aider — your editor’s agent — does the planning. Trailblaze handles the device. (A built-in agent ships in the box for cases where you don’t have a coding agent in the loop — see Built-in Agent (Fallback).)
See a real run¶
Every trail produces a rich, replayable report. These runs are generated by CI straight from the Android, iOS, and web trails in this repo — no mockups. Click through for the full interactive reports, or browse the Report Gallery for more.
Quickstart¶
Install Trailblaze first (Getting Started walks through it), then:
trailblaze device list
# Pin this terminal to a device — subsequent calls inherit it.
# Trailblaze remembers per-terminal, so other terminals stay independent.
trailblaze device connect android
# Read the screen — returns a UI tree with refs (e.g. ab42) your agent can target
trailblaze snapshot
# Act on a referenced element. Every action takes --step so self-heal can recover.
trailblaze tool tap ref=ab42 -s "Tap sign in"
Paste those into a Claude Code, Codex, Cursor, or Goose session and your agent is already
authoring trails. (Every device-acting command — snapshot, tool, step, ask,
verify, session start/stop, run — also accepts -d <platform> as a per-call
override, and --target <app> where supported (tool, step, session start,
mcp) — useful in CI / scripts that prefer determinism over shell state.) A longer
walkthrough lives in Getting Started.
How Trailblaze grows with you¶
Three rungs. You can stop at any of them.
- Drive a device. Point your coding agent at the
trailblazeCLI. Natural-language device control across iOS, Android, and web — through built-in primitives (snapshot,tool,toolbox) plus any custom tools your team has shipped. - Save and replay. Any session becomes a
.trail.yamlviatrailblaze session save. Replay ad-hoc withtrailblaze run, commit it to your repo as a CI regression test, or open it in the Trace Viewer — same artifact, three uses, no LLM at replay. - Compose your own agent surface. Give your agent first-class commands like
loginoraddToCart, named waypoints for your screens, and trailmaps shared across teams. Curate exactly what your agent sees: surface yourlogin, hide the low-level taps, pick four of twenty primitives if that’s what your tests need. Custom commands are typed and replayable; every call — yours, the built-ins, or third-party — is a first-class command.
Native fidelity on every platform¶
Trailblaze does not flatten platforms into a single lowest-common-denominator abstraction. Each driver speaks its host platform’s native vocabulary:
| Platform | Driver | Hierarchy |
|---|---|---|
| Android | UiAutomator / Compose / on-device instrumentation | Button, EditText, RecyclerView, Switch |
| iOS | Native Accessibility / XCUITest | UIButton, UITextField, UITableView |
| Web | Playwright | ARIA roles, full DOM, network, console |
The agent picks elements semantically — “the Sign in button” — from the native hierarchy. Trailblaze computes the platform-specific selector behind the scenes. The natural-language test stays the same; the execution uses each platform’s full power.
This only works because an agent is driving. Exposing twenty platform-specific selector strategies per element to a human is no one’s idea of a good testing SDK. Exposing it to an LLM is the point.
Trace Viewer¶
Every run — driven by you, your coding agent, the CLI, or CI — produces a rich session you can inspect: per-step screenshots, recorded tool calls, view-hierarchy snapshots, the full LLM transcript (when an LLM was involved), and video replay when capture is on.
Same viewer surface, three ways:
- Desktop app —
trailblaze appopens the Sessions list across every device and run, with live updates while a session is running, one-click “show me the trail YAML” to copy back into your project, and inline trail editing. - Inline on every CI build — share a URL, open in a browser, no Trailblaze install required.
- On disk under
~/.trailblaze/logs/<sessionId>/if you ever need to grep raw artifacts.
When you want a different selector than the one Trailblaze auto-picked for a step, the viewer lets you choose from generated alternatives computed against the same captured hierarchy — human judgment, no re-recording. Same viewer for iOS, Android, and web.
Self-heal¶
Recorded trails replay deterministically by default — no LLM in the loop, no flake. When a recorded step genuinely doesn’t match the screen anymore, there are two repair paths:
- Built-in self-heal handles small drift — text changes, an unexpected popup, a
minor reorder. Opt in with
--self-healand Trailblaze’s built-in agent patches the failing step against the live screen and updates the recording on success. - Your coding agent handles the larger cases — anything that needs project context, log inspection, or judgment about intent. The trace session is the diagnosis surface; Claude Code, Cursor, or Codex read the trace, compare what the step intended (its natural-language step text) to what the app now does, and propose a fix.
The natural-language step text is what makes this work. It captures what the step was trying to do, so repair is a matter of updating the how against the current app — not re-deriving intent from a broken selector. Default is fail-loud; self-heal is opt-in so real flakes don’t get silently masked.
Core Capabilities¶
- CLI any agent can drive —
snapshotto read the screen,toolto act,runto replay a.trail.yaml,session saveto persist a recording. Every capability is a shell subcommand; your coding agent invokes them the same way you do. --stepon every tool call — capture why alongside what. When the UI drifts, recorded trails self-heal against the recorded step text instead of breaking on a brittle selector. (--objective/-oremain accepted as deprecated aliases.)- Self-heal — built-in for small drift, your coding agent for larger cases via the trace.
- Trails — drop a
.trail.yamlanywhere in your project. Notrails/directory required. Run by path or shell glob; auto-discovered. - Trace Viewer — every run produces a rich session: per-step screenshots, hierarchies, recorded tool calls, LLM transcripts, video replay. CI exposes it inline; the desktop app shows the same UI for local sessions.
- External config bundles — layer app targets, YAML toolsets, and TypeScript scripted tools on top of the binary without rebuilding Trailblaze.
- Multi-device CLI sessions — drive Android + iOS + web from the same shell, in parallel, each with its own bound device.
Trailmaps¶
A trailmap is everything one app needs to be testable, in one self-contained
directory: the custom tools your agent uses to drive the app (login, addToCart),
the framework toolsets you want exposed (core_interaction, verification), named
locations in the app, and the recorded trails that exercise it. An app team publishes
the trailmap once; other teams (and your CI, and live agents) consume it. Concretely
it’s a folder with one trailmap.yaml manifest plus a few .ts files for the custom
tools — see Your First Trailmap for the full walkthrough,
and Publishing a Trailmap for the distribution tiers (local
workspace, GitHub repo, npm package).
If you’ve used Page Object (web testing) or Robot Pattern (Android) — reusable
helper layers that wrap raw UI primitives into business-level actions like login and
addToCart, so individual tests read like “log in, add to cart, check out” instead of
“tap 4, type ‘foo’, tap 12” — a trailmap is the same instinct made shippable: a
published library plus a navigation model plus runnable proof, rather than a bag of
helper methods buried inside one test suite. (If you haven’t used either, you can
ignore the comparison — the description above is self-contained.)
See the Trailmaps guide for the manifest schema and the workspace-vs-classpath precedence rule. Background: npm Distribution for Trailmaps, Target Trailmaps: Local-First Packaging.
Scripted Tools (TypeScript)¶
Start here: Your First Trailmap walks one tool from an empty directory to a passing run. Once you’re past the first run, the per-tool Scripted Tools (TypeScript) reference covers the authoring details.
What scripted tools give you: custom tools, written in TypeScript, that drop into a
trailmap with no Kotlin code, no Gradle build, and no per-tool YAML descriptor. Declare
inputs as a TypeScript interface, write the handler against the typed
ctx.tools.<name>(args) composition surface, and the framework derives the schema, the
LLM-facing description, and the IDE bindings from the .ts file itself. Tools execute
in an embedded JavaScript sandbox shipped inside Trailblaze (no separate Node install,
no node_modules); tools that need full Node APIs opt into a Bun subprocess with one
flag.
Two worked target trailmaps live in the OSS tree as full-shape references to copy —
examples/ios-contacts (iOS, host driver) and
examples/wikipedia (web, Playwright Native). Each ships
~9 scripted tools, a target-scoped system prompt, and ~20 trails exercising them.
The older export async function + full-YAML-descriptor authoring shape stays
documented as a Legacy Reference; existing legacy tools keep
working unmodified — new authoring should use the typed shape. Background:
@trailblaze/scripting Authoring Vision.
Active Prototype: Waypoints¶
Trailblaze is moving fast. Waypoints are landing now and worth knowing about even
though they’re not yet stable: named, assertable locations in the app, defined
structurally (element identity, stable labels), never by content. Waypoints power the
agent’s mental map of an app: it can ask “am I on the Inbox?”, land on a waypoint after
a step, or use waypoints as trail checkpoints. The matchWaypoint tool runs against
captured session state and returns clean matches plus near-misses (off by one
assertion), so authors iterate without staged pipelines.
See: Waypoints and App Navigation Graphs, Waypoint Discovery via matchWaypoint.
Built for an evolving ecosystem¶
The AI agent ecosystem is moving fast. Whatever it looks like in a year — or five, or ten — your natural-language trails will come with you. Trailblaze captures what you’re testing as portable prose; the how (selectors, recordings, agent harness, framework version) adapts as the landscape changes.
Built-in Agent (Fallback)¶
Trailblaze ships a built-in agent — trailblaze step, plus the vision primitives
trailblaze ask and trailblaze verify — for cases where you don’t have a coding agent
in the loop. It’s the same agent that powers --self-heal and the CI fallback path.
(trailblaze blaze remains accepted as a deprecated alias of trailblaze step.)
These commands appear under Built-in agent: at the bottom of trailblaze --help,
below the recommended deterministic primitives. They require an LLM:
trailblaze config llm anthropic/claude-sonnet-4-20250514
The built-in agent implements features from the Mobile-Agent-v3 research line: exception handling for popups and stuck states, reflection and self-correction, task decomposition, cross-app memory, and enhanced recording for robust replay.
For serious authoring work, you want a real coding agent (Claude Code, Cursor, Codex) driving the Trailblaze primitives instead — those bring your codebase, log inspection, and project context to the loop, which the built-in agent can’t.
Where to Go Next¶
- New here? Start with Getting Started.
- Wiring a coding agent over the CLI? See the CLI reference and the README.
- Authoring trails? See Project Layout and Configuration.
- Composing your own surface? Start with Your First Trailmap — it’s the workspace-to-passing-run walkthrough and links onward to the per-tool Scripted Tools (TypeScript) reference, the Trailmaps manifest schema, and the Trailblaze Tools catalog of scripted / pure-YAML / Kotlin flavors.
- Customizing the LLM? See LLM Configuration and Built-in Models.
- Going deep? See Architecture and the devlog.
License¶
Trailblaze is licensed under the Apache License 2.0.