Getting Started¶
Natural-language device control for your coding agent — across iOS, Android, and web. Every session is a replayable trail you can run as a test.
Trailblaze gives your coding agent — Claude Code, Cursor, Codex, Goose, Aider, anything
that can run a shell command — a single, typed way to drive any device. The agent reads
the screen, picks elements semantically, and acts through Trailblaze primitives plus
whatever custom commands your team has shipped. Every action records its natural-language
objective. The resulting .trail.yaml is both the source of truth — what the flow
does — and the deterministic execution artifact — how it runs. CI replays the trail
with no LLM in the loop.
Trailblaze is not its own coding agent. Your editor’s agent does the planning and the reading; Trailblaze handles the device.
What you’ll do in this guide¶
- Install Trailblaze and point it at a device.
- Drive the device through primitives (
snapshot,tool) from a shell — or from your coding agent shelling out to the same commands. - Save what you just did as a
.trail.yaml. - Replay it deterministically with
trailblaze run. - Inspect any session in the Trace Viewer.
This mirrors the adoption ladder Trailblaze is designed around — drive → save and replay → compose your own agent surface. You can stop at any rung.
System Requirements¶
| macOS | Linux | |
|---|---|---|
| Desktop App (GUI) | Supported | Not supported |
| Headless / CLI | Supported | Supported |
- JDK 17+ on all platforms
- Android SDK with
adbon your PATH for Android devices and emulators - Xcode + simctl for iOS simulators
- A Playwright-compatible Chromium (auto-installed) for web
Install¶
brew install block/tap/trailblaze
Or install from the GitHub release:
curl -fsSL https://raw.githubusercontent.com/block/trailblaze/main/install.sh | bash
Or clone and run from source:
git clone https://github.com/block/trailblaze.git
cd trailblaze
./trailblaze --help # CLI usage
./trailblaze app # Start the daemon (also opens the desktop app on macOS)
Bundled with the Homebrew install:
bun— the JavaScript runtime Trailblaze uses to type-check and analyze your TypeScript scripted tools.brew install block/tap/trailblazepulls it in as a dependency, so authoring typed scripted tools works out of the box — nobun install, nonode_modules(the SDK and analyzer ship inside the CLI). Installing from source orinstall.shinstead? Just putbunon yourPATH(bun.sh).
Optional, via Homebrew (brew install esbuild ffmpeg):
esbuild— needed only when running trailmap-defined scripted tools written in TypeScript.ffmpeg— needed only for trail video capture and sprite extraction. Trails still run without it; only the rendered video and sprite-strip outputs are missing.
Connect a Device¶
List what’s connected, then pin this terminal to one of them:
trailblaze device list
# Pin this terminal to a device + target. Subsequent calls inherit both,
# so you don't have to repeat -d / --target on every command.
trailblaze device connect android --target default
device list shows Android emulators (android/emulator-5554), iOS simulators
(ios/<simulator-id>), and any web targets. For device connect, the short form
android works when only one Android device is connected; with two or more, pass
the fully-qualified android/<id> shown by device list (same for ios/<udid>).
web is always unambiguous.
That’s it — you’re ready to drive. To swap target without disconnecting, use
trailblaze device rebind --target <new>; to release, trailblaze device disconnect.
Device pinning — reference details¶
You can skip this on first read. It covers what the pin actually does, which commands respect it, and how to override it in CI.
After device connect, every device-acting CLI call (snapshot, tool, step, ask,
verify, session start/stop, run) picks up the pinned device automatically — no
-d flag needed. Workspace and setup commands (config, app, device list) don’t
take -d. mcp takes --device / --target at startup to pre-bind the MCP session
so the agent’s first tool call already has a device.
Multiple terminals stay independent. The pin is per-shell-PID, recorded in
~/.trailblaze/shell-device-pins-<port>.json. Pinning device A in one terminal doesn’t
leak into another, and the pin survives daemon restarts. For CI scripts (each call is a
fresh shell), pass --device <id> on every command — the per-shell pin won’t carry.
Resolution order, highest priority first:
- Explicit
--device <id>flag on the command. TRAILBLAZE_DEVICEenv var (manual override, mostly for CI).- This terminal’s file-pin (from
trailblaze device connect). - Autodetect — used when exactly one device is connected.
If your pinned device goes away (emulator killed, USB unplugged), the next call
fails with Device bind failed and self-evicts the pin; the call after that falls
through to autodetect.
Drive the Device¶
Two primitives are enough to start: snapshot to read the screen, tool to act.
Wrap them in session start / session stop so the steps you take are tracked as a
single recording you can save as a trail.
# Start a tracked session bound to the pinned device — captures video + logs and
# groups the steps you take so `session save` has something to write out.
trailblaze session start --title "login_flow"
# Read the screen — returns a UI tree with refs (e.g. ab42) the agent can target.
trailblaze snapshot
# Act on a referenced element. Every action takes --step.
trailblaze tool tap ref=ab42 -s "Tap sign in"
trailblaze tool inputText text="test@example.com" -s "Enter email"
Why --step is mandatory¶
--step (-s) is the natural-language description of what the step is doing —
not how. "Tap sign in" survives a redesign; "tap button at 200,400" does not.
(--objective / -o remain accepted as deprecated aliases — no runtime warning, but
new code should write --step.)
Step text is what makes agent-authored trails durable. When the UI later drifts, self-heal patches the failing step against the new screen by re-deriving the how from the recorded what. No step text, no self-heal.
From your coding agent¶
If you’d rather have your coding agent (Claude Code, Cursor, Codex, Goose, Aider) do the driving — useful when you’re already mid-task in your editor — point it at the CLI. Drop this into the agent’s session:
You have access to the `trailblaze` CLI. Use it to drive the connected device. First
pin this terminal to a device + target so subsequent calls don't have to repeat the flags:
- `trailblaze device connect <platform> --target <app>` — pin once at start
- `trailblaze session start --title "<short_name>"` — start a tracked session
(captures video/logs, groups the steps for later save)
- `trailblaze snapshot` — see what's on screen (UI tree with refs)
- `trailblaze tool <name> <args> -s "<why>"` — take an action
- `trailblaze toolbox` — list available tools (uses the pinned device automatically)
- When done, `trailblaze session save` to write the recording out as a `.trail.yaml`,
then `trailblaze session stop` to end the session. Optionally
`trailblaze device disconnect` to release the device.
If your agent can run a shell command, it can drive a device. No SDK to install, no protocol to negotiate, no provider keys to wire on the agent’s side.
Save the Session as a Trail¶
While the session was running, Trailblaze recorded every step. Persist the recording
as a .trail.yaml and end the session:
trailblaze session save # uses the title from `session start`
trailblaze session stop
The resulting file is a list of natural-language steps with recorded tool sequences for
deterministic replay. Drop it anywhere in your project (a trails/ directory is
conventional but not required — see Project Layout for discovery
rules).
A minimal example:
- prompts:
- verify: the "Sign in" screen is visible
- step: Sign in as the demo user
- verify: the home tab is selected
Replay Deterministically¶
trailblaze run flows/login.trail.yaml -d android
trailblaze run "flows/**/*.trail.yaml" -d android # batch via shell glob
trailblaze run flows/login.trail.yaml -d android --self-heal
By default trailblaze run replays the recorded tool sequence with no LLM in the
loop — fast, deterministic, cheap. This is the path CI takes.
When does the LLM actually run?¶
A reasonable question for a cost-conscious team. The short answer:
| Step / mode | LLM called? |
|---|---|
A step with a recording: block (replay) |
No — deterministic, replays the recorded tool calls. |
A bare step: (no recording:) |
Yes — the agent picks the tools and selectors live. |
verify: (vision assertion) |
Yes — an LLM judges the screenshot against the prose claim. |
--self-heal (any step that drifted) |
Yes — only on the failing step, only when --self-heal is set. |
Authoring mode: trailblaze step "…", trailblaze ask "…", trailblaze verify "…" |
Yes — these are the built-in-agent surface. |
So a fully-recorded trail with no verify: steps and no --self-heal is 100% LLM-free
at replay. Add verify: steps when you want vision-grade assertions that survive
selector drift; add --self-heal opt-in in CI when you’d rather an agent try to patch a
drifted step than fail the build. The Ollama path is keyless if you want to stand this
up without paying anything.
--self-heal opts in to small-drift recovery: if a recorded step doesn’t match the
screen anymore, Trailblaze’s built-in agent patches the failing step against the live
screen and updates the recording on success. Self-heal is opt-in by design; the default
is fail-loud so real flakes don’t get silently masked.
Self-heal needs an LLM (it’s the LLM that figures out the new selector against the
drifted screen). Set a provider key once (OPENAI_API_KEY, ANTHROPIC_API_KEY,
GOOGLE_API_KEY, or OPENROUTER_API_KEY) — or run a local model via Ollama with no
key — and you’re set. See LLM Configuration for the full list
of supported providers and how teams configure shared gateways. Plain trailblaze run
without --self-heal doesn’t touch an LLM at all.
When drift is larger than self-heal can handle — anything that needs project context, log inspection, or judgment about intent — your coding agent does the repair. It reads the trace session (view hierarchies, screenshots, video, platform logs, the trail YAML itself), compares what the step intended to what the app now does, and proposes a fix. You review and commit.
Inspect Any Session in the Trace Viewer¶
Every run — driven by you, your coding agent, the CLI, or CI — produces a rich session: per-step screenshots, recorded tool calls, view-hierarchy snapshots, the full LLM transcript (when an LLM was involved), and video replay when capture is enabled.
The same Trace Viewer surface is available three ways:
- Desktop app —
trailblaze appopens the Sessions list across every device and run, with live updates while a session is running, one-click “show me the trail YAML” to copy back into your project, and inline trail editing and re-running. - Inline on every CI build — share a URL, open in a browser, no Trailblaze install required.
- On disk under
~/.trailblaze/logs/<sessionId>/if you ever need to grep raw artifacts.
When you want a different selector than the one Trailblaze auto-picked for a step, the viewer lets you choose from generated alternatives computed against the same captured hierarchy — human judgment, no re-recording. Same viewer for iOS, Android, and web.
Compose Your Own Agent Surface¶
The third rung of the adoption ladder. Tools you add to your agent’s surface become first-class commands the next time your agent drives a device.
- Custom commands (Scripted Tools) like
loginoraddToCart, written in TypeScript with type-safe bindings, with LLM-facing descriptions you write for the tool and each parameter. Your agent reads those descriptions to decide when and how to call them. Every call — yours, the built-ins, or third-party — is recordable and replayable. Tools run in an embedded JavaScript sandbox shipped inside Trailblaze (no separate Node install, nonode_modules); the rare tool that needs full Node APIs opts into a Bun subprocess with one flag. - Named waypoints for your screens, so the agent can ask “am I on the Inbox?”, land on a waypoint after a step, or use waypoints as trail checkpoints. Waypoints are an active prototype — see the devlog.
- Trailmaps to bundle tools + waypoints + recorded trails per app, shared across teams — the unit of authoring going forward.
Built-in Agent (Fallback)¶
Trailblaze ships a built-in agent — trailblaze step, plus the vision primitives
trailblaze ask and trailblaze verify — for cases where you don’t have a coding agent
in the loop. It’s the same agent that powers --self-heal and the recommended CI
workflow. (trailblaze blaze remains accepted as a deprecated alias of
trailblaze step.)
These commands appear under Built-in agent: at the bottom of trailblaze --help,
below the recommended deterministic primitives. They require an LLM:
trailblaze config llm anthropic/claude-sonnet-4-20250514
trailblaze config models # list everything available
Set your provider key in your shell:
export ANTHROPIC_API_KEY="sk-ant-…"
Built-in support: OpenAI (OPENAI_API_KEY), Anthropic (ANTHROPIC_API_KEY), Google
(GOOGLE_API_KEY), OpenRouter (OPENROUTER_API_KEY), Ollama (no key required). For
custom endpoints, enterprise gateways, or workspace-level overrides, see
LLM Configuration.
The built-in agent focuses on the natural-language step it’s currently executing, with vision into the current screen and past steps — fine for many flows. For serious authoring work, you want a real coding agent (Claude Code, Cursor, Codex) driving the Trailblaze primitives instead — those bring your codebase, log inspection, and project context to the loop, which the built-in agent can’t.
If you’re shelling out from a coding agent and only using deterministic primitives
(snapshot, tool, run), you don’t need a Trailblaze-side LLM at all. The agent does
the thinking; Trailblaze does the doing.
Next Steps¶
- CLI Reference — every command and flag
- Tool Authoring — add your own tools
- Configuration — providers, devices, target apps
- Project Layout — where Trailblaze looks for trails and configs
- Architecture — how drivers, the recording pipeline, and the built-in agent loop fit together
- Android On-Device Testing — instrumentation tests on real Android devices
- Host JVM Unit Tests — running trails from JUnit