Trail YAML v2 Syntax¶
Evolving our YAML syntax based on months of real-world trail authoring.
Background¶
The current .trail.yaml syntax uses generic keywords (prompts, tools, config) that don’t convey Trailblaze’s identity and create unnecessary nesting. Key pain points:
prompts→recording→toolsis deeply nested for what’s conceptually “here’s an objective and the tools used to achieve it.”configis generic and doesn’t communicate what this block represents in the Trailblaze mental model.stepis ambiguous — it doesn’t communicate intent. A step could be “tapOnElement ref=e23” (mechanical) or “Tap the Sign In button” (intent). The keyword doesn’t nudge authors toward the durable, self-healing form.verifyis a separate keyword fromstep, but verification is really just a type of objective — the distinction is better expressed by the tools used (e.g.,assertVisibleBySelector).- Bare
- tools:blocks let authors skip natural language intent entirely, producing fragile trails that can’t self-heal when the UI changes. recordingis an odd name for “the tools used to achieve this objective” — it implies a process (recording) rather than what it contains (tools).- No support for pre-seeded variables — test data like emails, card numbers, and credentials must be hardcoded in step text or the
contextstring rather than declared as structured, referenceable values. - No test setup concept — setup steps (launch, sign in, navigate) are mixed in with test steps. There’s no checkpoint to replay when iterating, and no way to distinguish “couldn’t reach the starting point” from “test failed.”
- File is a list of items —
[config, prompts, tools]when there’s exactly one of each. A document with named properties is simpler. - Maestro was a YAML primitive — now replaced by
MaestroTrailblazeTool, but the broader syntax should be updated to match.
What we decided¶
v2 Structure¶
The file is a YAML mapping (not a list) with two named sections:
| Section | Purpose | Contains |
|---|---|---|
trailhead |
Trail identity, configuration, and setup | id, title, context, memory, target, platform, metadata, setup |
trail |
The test itself | NL objectives with optional tool recordings |
The trailhead is everything about the starting point: what this trail is, how it’s configured, and the objectives to get there. The trail is the test itself.
Keyword Changes¶
| v1 Keyword | v2 Keyword | Rationale |
|---|---|---|
config |
trailhead |
The trailhead is where the trail begins — identity, configuration, and setup all live here. |
prompts |
trail |
The test objectives — the path you walk. Whether blazing (AI) or following a tool recording, it’s the trail. |
step |
objective |
Communicates intent (“what to achieve”) rather than mechanics (“what to do next”). Nudges authors toward natural language that enables self-healing trails. |
recording |
tools (under each objective) |
Honest about what it contains. Freed up by removing bare tools blocks — tools now only appears subordinate to an objective. |
- tools: (bare block) |
removed | Every entry must have an objective. No escape hatch that lets authors skip intent. This is the same principle as requiring --objective on the CLI. |
context |
context |
Unchanged — still the right word. It’s context for the trail, injected into the LLM system prompt at runtime. |
verify |
removed | Use objective for everything. Verification intent is expressed by the tools used. |
config fields |
trailhead fields |
id, title, context, memory, target, platform, metadata move into trailhead. |
| (none) | setup |
Setup objectives nested under trailhead — a checkpoint for recording iteration and deterministic replay. |
v2 Syntax — Full Example¶
# ── Trailhead: identity, configuration, and setup ──────────────────
trailhead:
id: testrail/suite_71172/section_838052/case_4837714
title: Verify user cannot load more than $2,000 onto a Gift Card within 24 hours
priority: P0
# Optional — unlocks app-specific custom tools (e.g. launchAppSignedIn, deeplinks)
# Without a target, trails run with generic tools only.
target: myapp
platform: ios
# Injected into the LLM system prompt for this trail
context: >
The gift card number to use is {{giftCardNumber}}.
Always dismiss any promotional dialogs before proceeding.
# Pre-seeded runtime variables — available as {{varName}} in objectives and tool params
memory:
giftCardNumber: "7783 3224 0646 3436"
email: testuser+giftcards@example.com
password: "12345678"
# Informational — never used at runtime, only for reporting/traceability
metadata:
caseId: "4837714"
sectionId: "838052"
testRailUrl: https://testrail.example.com/index.php?/cases/view/12345
# Setup objectives (checkpoint for recording iteration)
setup:
- objective: Launch the app with email {{email}} and password {{password}}
tools:
- myapp_ios_launchAppSignedIn:
email: "{{email}}"
password: "{{password}}"
- objective: Navigate to Gift Cards
tools:
- tap: "Gift cards"
# ── Trail: the test objectives ───────────────────────────────────────
trail:
- objective: Tap Reload card or check balance
tools:
- tap: "Check balance or reload card"
- objective: Enter the gift card number
tools:
- tap: "0000 0000 0000 0000"
- inputText: "{{giftCardNumber}}"
- objective: Tap Next
tools:
- tap: "Next"
- objective: Tap Add Value
tools:
- tap: "Add value"
- objective: Select $50 option
tools:
- tap: "$50"
- objective: Wait and tap Review sale
tools:
- tap: "Review sale 1 item"
- objective: Tap Charge $50.00
tools:
- tap: "Charge $50.00"
- objective: Tap on $50 amount
tools:
- tap: "$50"
# Non-recordable objective — AI always handles this, tools are never overwritten
- objective: Dismiss any payment confirmation dialogs
recordable: false
- objective: Verify the payment was declined
tools:
- assertVisible: "Amount exceeds gift card balance limit."
- assertVisible: "Declined"
- assertVisible: "Cancel Payment"
blaze.yaml — NL Definition (Cross-Platform)¶
The blaze file is purely NL objectives — no tool recordings. Platform-specific tool recordings live in .trail.yaml files.
trailhead:
id: suite/71172/section/838052/case/4837714
title: Verify gift card load limit
memory:
giftCardNumber: "7783 3224 0646 3436"
email: testuser+giftcards@example.com
setup:
- objective: Launch the app and sign in with {{email}}
- objective: Navigate to Gift Cards
trail:
- objective: Tap Reload card or check balance
- objective: Enter the gift card number
- objective: Tap Next
- objective: Tap Add Value
- objective: Select $50 option
- objective: Wait and tap Review sale
- objective: Tap Charge $50.00
- objective: Tap on $50 amount
- objective: Dismiss any payment confirmation dialogs
recordable: false
- objective: >
Verify the message "Amount exceeds gift card balance limit" appears.
Verify the message "Declined" appears.
Verify "Cancel Payment" button is visible.
Key Design Principles¶
1. Two sections, each with one job. trailhead is where the trail begins — identity, configuration, and setup. trail is what you’re testing — the test itself.
2. Trailhead is the starting point. Everything about getting ready lives here: what this trail is (id, title), how it’s configured (context, memory, target), and the objectives to reach the starting state (setup). The trailhead is a complete description of where the trail begins.
3. Setup is a checkpoint. During recording, setup is a save point. Mess up the test? Replay setup instantly, re-record. This is the primary motivation — it serves the recording and iteration workflow.
4. Every entry is an objective. Both setup and trail are lists of objectives. Each objective has an NL description (the durable intent) with optional tools (the derived implementation). There are no bare tool blocks — every tool sequence must be subordinate to an objective. This is the same principle as requiring --objective on the CLI: intent is not optional.
5. objective communicates intent. The word “objective” nudges authors toward writing what to achieve (“Navigate to the Money tab”) rather than how to do it (“Tap the Money button”). This is the foundation of self-healing trails — when the UI changes, the objective is still clear, so the agent can re-solve it. Compare: step: tapOnElement ref=e23 feels valid; objective: tapOnElement ref=e23 feels obviously wrong. The name does free work.
6. tools is honest. The tool list under an objective is called tools because that’s what it is — the tools used to achieve the objective. The old name recording implied a process; tools describes the content. Removing bare - tools: blocks freed up the word to be used in its natural place: subordinate to an objective.
7. memory is active, metadata is passive. Memory variables are interpolated at runtime via {{varName}}. Metadata is never touched by the framework — purely for reporting and traceability.
8. context is the right word. It’s context for the trail — background information, constraints, and instructions injected into the LLM system prompt at runtime. The word communicates what the author is providing (context about this test) rather than an implementation detail (where the text is injected).
9. verify is just an objective. Any objective can perform verification — the intent is expressed by the tools used, not by a separate keyword.
10. recordable: false remains per-objective. This flag means “never overwrite this objective’s tools during re-recording” — useful for objectives that should always be handled by the AI.
11. File is a mapping, not a list. Since there’s exactly one of each section, named properties are simpler than an anonymous list of items.
12. No top-level interleaving. v1 allowed multiple prompts and tools blocks interleaved at the top level. v2 has exactly one trailhead and one trail.
13. objective is source of truth, tools is ephemeral cache. The semantic boundary is clear: objective (NL intent) is the durable, authoritative description. tools is a derived materialization — replaceable, rebuildable, secondary. The hierarchy itself communicates the relationship: tools is subordinate to objective.
Setup Behavior¶
Execution policy: 1. If tools exist → replay deterministically (no AI, instant) 2. If no tools → blaze via AI (first run), then save tools 3. If tools fail → re-blaze from NL description, save new tools
Failure semantics: - Setup failure = “couldn’t reach the starting point” → test is skipped/retried, not failed - Trail failure = “the test ran and something didn’t work” → test is failed
Reuse via custom tools:
Setup is shared across tests through custom tools. A recorded setup sequence can be promoted to a custom tool (e.g., setupMoneyTab), then referenced by NL in other tests’ setup.
CLI Integration¶
The objective keyword aligns directly with the blaze CLI’s --objective / -o flag:
# Each tool invocation declares its objective — groups into a single trail objective on save
blaze tool tapOnElement ref="Email" -o "Enter login credentials"
blaze tool typeText text="test@example.com" -o "Enter login credentials"
blaze tool tapOnElement ref="Sign In" -o "Enter login credentials"
# Goal mode — objective is the command itself
blaze "Enter login credentials"
When an external agent (Claude Code, Cursor, etc.) uses blaze tool with -o, consecutive calls sharing the same objective string are grouped into a single trail objective with a multi-tool recording. This gives external agents playwright-cli speed while producing repairable trails — they provide the intent they already have, and Trailblaze handles the recording infrastructure.
Fast Mode (--fast)¶
Fast mode is a runtime flag, not a trail property. It’s the agent saying “run as fast as you can while you complete these tool calls.” The trail file is the same either way — --fast only controls execution overhead.
# Full mode: screenshots, logging, timeline view
blaze -o "Enter login credentials" tool tapOnElement ref="Email"
# Fast mode: text-only, skip screenshots/logging, maximum speed
blaze --fast -o "Enter login credentials" tool tapOnElement ref="Email"
# Fast mode with goal (inner agent uses text-only compact element list)
blaze --fast "Enter login credentials"
# Environment variable for CI pipelines
BLAZE_FAST=1 blaze "Enter login credentials"
What --fast skips: screenshots in LLM prompts (no vision tokens — text-only analysis using compact element lists), post-action screen recapture (the next command captures fresh state anyway). What it keeps: all logging (objective start/complete, tool calls), session recording, trail assembly — the full trail is still produced with all tool calls recorded. An agent can still explicitly request a screenshot via ask(includeScreenshot=true) when it needs to disambiguate visually. The trail produced is identical and repairable — it just skips the per-step screenshot overhead during execution.
Naming Glossary¶
| Term | What it is |
|---|---|
trailhead |
Trail identity, configuration, and setup — where the trail begins |
setup |
Setup objectives within the trailhead (checkpoint for recording iteration) |
trail |
Test objectives — the path you walk (the test) |
objective |
Individual intent within setup or trail — “what to achieve”, not “how to do it” |
tools |
Tool list under an objective — the derived implementation (ephemeral, replaceable) |
| blazing | AI exploration when no tools exist for an objective (verb, not keyword) |
blaze.yaml |
NL definition file — the plan before you go |
*.trail.yaml |
Platform recording file — the trail left behind |
memory |
Pre-seeded variables for template interpolation |
context |
Trail context — injected into the LLM system prompt at runtime |
Migration Strategy¶
- Build a new v2 parser alongside the existing one in
trailblaze-models/commonMain. - Try-catch fallback: attempt v2 parsing first, fall back to v1 on failure.
- Bulk migrate all
.trail.yamlandblaze.yamlfiles once v2 is stable. - Delete v1 parser after migration is complete.
v1 → v2 Mapping¶
| v1 | v2 |
|---|---|
- config: (list item) |
trailhead: (mapping key) |
- prompts: (list item, multiple allowed) |
trail: (mapping key, exactly one) |
- tools: (list item, standalone top-level) |
removed — wrap in - objective: with NL intent |
step: + recording: tools: |
objective: + tools: |
verify: |
objective: (with assertion tools) |
context: (in config) |
context: (in trailhead) |
| multiple interleaved blocks | single trailhead + trail |
What changed¶
Positive:
- Two clearly distinct sections — trailhead (starting point) and trail (the test)
- Setup as a checkpoint within trailhead enables recording iteration and deterministic setup replay
- trailhead semantically groups identity + config + setup as “everything about the starting point”
- objective communicates intent — nudges authors toward “what to achieve” over “how to do it”, enabling self-healing trails
- objective aligns with CLI --objective / -o flag, giving external agents a natural way to declare intent while using direct tools
- tools under each objective is honest about what it contains — no more recording indirection
- No bare tools blocks — every entry has intent, same principle as requiring --objective on the CLI
- File is a mapping — simpler than a list when there’s one of each section
- Clear semantic boundary: objective is source of truth, tools is ephemeral cache — hierarchy communicates the relationship
- setup and trail share the same authoring model (lists of objectives)
- Structured variable support via memory
- context is unchanged from v1 — already the right word, no migration needed for this field
- Removing verify simplifies the model — one fewer concept to learn
- Foundation for future mode-based execution configuration
- Setup failure vs trail failure distinction improves test reporting
Negative:
- All existing .trail.yaml and blaze.yaml files must be migrated (mitigated by try-catch fallback period)
- External tools/scripts that parse trail files need updating
- Two parsers coexist temporarily during migration
- Authors who previously used bare tools blocks must now provide an objective — a small tax that produces repairable trails