Trail YAML v2 Syntax¶
Evolving our YAML syntax based on months of real-world trail authoring.
Background¶
The current .trail.yaml syntax uses generic keywords (prompts, tools, config) that don’t convey Trailblaze’s identity and create unnecessary nesting. Key pain points:
prompts→recording→toolsis deeply nested for what’s conceptually “here’s a step and what was recorded.”configis generic and doesn’t communicate what this block represents in the Trailblaze mental model.verifyis a separate keyword fromstep, but verification is really just a type of step — the distinction is better expressed by the tools used (e.g.,assertVisibleBySelector).contextdoesn’t communicate that the value is injected into the LLM system prompt.- No support for pre-seeded variables — test data like emails, card numbers, and credentials must be hardcoded in step text or the
contextstring rather than declared as structured, referenceable values. - No test setup concept — setup steps (launch, sign in, navigate) are mixed in with test steps. There’s no checkpoint to replay when iterating, and no way to distinguish “couldn’t reach the starting point” from “test failed.”
- File is a list of items —
[config, prompts, tools]when there’s exactly one of each. A document with named properties is simpler. - Maestro was a YAML primitive — now replaced by
MaestroTrailblazeTool(see PR #1944), but the broader syntax should be updated to match.
What we decided¶
v2 Structure¶
The file is a YAML mapping (not a list) with two named sections:
| Section | Purpose | Contains |
|---|---|---|
trailhead |
Trail identity, configuration, and setup | id, title, systemPrompt, memory, target, platform, metadata, setup |
trail |
The test itself | NL steps with optional recordings |
The trailhead is everything about the starting point: what this trail is, how it’s configured, and the steps to get there. The trail is the test itself.
Keyword Changes¶
| v1 Keyword | v2 Keyword | Rationale |
|---|---|---|
config |
trailhead |
The trailhead is where the trail begins — identity, configuration, and setup all live here. |
prompts |
trail |
The test steps — the path you walk. Whether blazing (AI) or following a recording, it’s the trail. |
recording.tools |
recording (under each step) |
Flattened from 2 levels to 1. The tools wrapper is removed. |
tools (top-level) |
tools (in step lists) |
No longer a standalone top-level block. Now a directly authored deterministic primitive alongside step entries in setup and trail. |
context |
systemPrompt |
No ambiguity — this text is injected into the LLM system prompt. |
verify |
removed | Use step for everything. Verification intent is expressed by the tools used. |
config fields |
trailhead fields |
id, title, systemPrompt, memory, target, platform, metadata move into trailhead. |
| (none) | setup |
Setup steps nested under trailhead — a checkpoint for recording iteration and deterministic replay. |
v2 Syntax — Full Example¶
# ── Trailhead: identity, configuration, and setup ──────────────────
trailhead:
id: testrail/suite_71172/section_838052/case_4837714
title: Verify user cannot load more than $2,000 onto a Gift Card within 24 hours
priority: P0
# Optional — unlocks app-specific custom tools (e.g. launchAppSignedIn, deeplinks)
# Without a target, trails run with generic tools only.
target: myapp
platform: ios
# Injected into the LLM system prompt for this trail
systemPrompt: >
The gift card number to use is {{giftCardNumber}}.
Always dismiss any promotional dialogs before proceeding.
# Pre-seeded runtime variables — available as {{varName}} in steps and tool params
memory:
giftCardNumber: "7783 3224 0646 3436"
email: testuser+giftcards@example.com
password: "12345678"
# Informational — never used at runtime, only for reporting/traceability
metadata:
caseId: "4837714"
sectionId: "838052"
testRailUrl: https://testrail.example.com/index.php?/cases/view/12345
# Setup steps (checkpoint for recording iteration)
setup:
- step: Launch the app with email {{email}} and password {{password}}
recording:
- myapp_ios_launchAppSignedIn:
email: "{{email}}"
password: "{{password}}"
- tools:
- tap: "Gift cards"
# ── Trail: the test steps ────────────────────────────────────────────
trail:
- step: Tap Reload card or check balance
recording:
- tap: "Check balance or reload card"
- step: Enter the gift card number
recording:
- tap: "0000 0000 0000 0000"
- inputText: "{{giftCardNumber}}"
- step: Tap Next
recording:
- tap: "Next"
- step: Tap Add Value
recording:
- tap: "Add value"
- step: Select $50 option
recording:
- tap: "$50"
- step: Wait and tap Review sale
recording:
- tap: "Review sale 1 item"
- step: Tap Charge $50.00
recording:
- tap: "Charge $50.00"
- step: Tap on $50 amount
recording:
- tap: "$50"
# Non-recordable step — AI always handles this, recording is never overwritten
- step: Dismiss any payment confirmation dialogs
recordable: false
# Direct tools block — hand-authored deterministic sequence, no NL step needed
- tools:
- assertVisible: "Amount exceeds gift card balance limit."
- assertVisible: "Declined"
- assertVisible: "Cancel Payment"
blaze.yaml — NL Definition (Cross-Platform)¶
The blaze file is primarily NL — no recordings. tools blocks are allowed for platform-agnostic deterministic sequences, but platform-specific recordings live in .trail.yaml files.
trailhead:
id: suite/71172/section/838052/case/4837714
title: Verify gift card load limit
memory:
giftCardNumber: "7783 3224 0646 3436"
email: testuser+giftcards@example.com
setup:
- step: Launch the app and sign in with {{email}}
- step: Navigate to Gift Cards
trail:
- step: Tap Reload card or check balance
- step: Enter the gift card number
- step: Tap Next
- step: Tap Add Value
- step: Select $50 option
- step: Wait and tap Review sale
- step: Tap Charge $50.00
- step: Tap on $50 amount
- step: Dismiss any payment confirmation dialogs
recordable: false
- step: >
Verify the message "Amount exceeds gift card balance limit" appears.
Verify the message "Declined" appears.
Verify "Cancel Payment" button is visible.
Key Design Principles¶
1. Two sections, each with one job. trailhead is where the trail begins — identity, configuration, and setup. trail is what you’re testing — the test itself.
2. Trailhead is the starting point. Everything about getting ready lives here: what this trail is (id, title), how it’s configured (systemPrompt, memory, target), and the steps to reach the starting state (setup). The trailhead is a complete description of where the trail begins.
3. Setup is a checkpoint. During recording, setup is a save point. Mess up the test? Replay setup instantly, re-record. This is the primary motivation — it serves the recording and iteration workflow.
4. Two kinds of entries: step and tools. Both setup and trail are lists that can contain either kind. A step has an NL description (the durable intent) with an optional recording (ephemeral derived cache). A tools block is a directly authored deterministic sequence — no NL, no recording, hand-written by the author. “Blazing” (AI exploration) is a process/verb, not a keyword.
5. recording is flat. recording.tools becomes just recording — one level of nesting removed.
6. memory is active, metadata is passive. Memory variables are interpolated at runtime via {{varName}}. Metadata is never touched by the framework — purely for reporting and traceability.
7. systemPrompt is honest. Calling it what it is removes all ambiguity about where this text ends up.
8. verify is just a step (or tools). Any step can perform verification — the intent is expressed by the tools used, not by a separate keyword. Pure assertion sequences can also be written as direct tools blocks.
9. recordable: false remains per-step. This flag means “never overwrite this step’s recording during re-recording” — useful for steps that should always be handled by the AI.
10. File is a mapping, not a list. Since there’s exactly one of each section, named properties are simpler than an anonymous list of items.
11. No top-level interleaving. v1 allowed multiple prompts and tools blocks interleaved at the top level. v2 has exactly one trailhead and one trail. Within each list, step and tools entries can be freely mixed — but the top-level structure is fixed.
12. step is source of truth, recording is ephemeral cache. The semantic boundary is clear: step (NL intent) is the durable, authoritative description. recording is a derived materialization — replaceable, rebuildable, secondary. tools blocks are different: they are hand-authored and authoritative in their own right.
Setup Behavior¶
Execution policy: 1. If recording exists → replay deterministically (no AI, instant) 2. If no recording → blaze via AI (first run), then save recording 3. If recording fails → re-blaze from NL description, save new recording
Failure semantics: - Setup failure = “couldn’t reach the starting point” → test is skipped/retried, not failed - Trail failure = “the test ran and something didn’t work” → test is failed
Reuse via custom tools:
Setup is shared across tests through custom tools. A recorded setup sequence can be promoted to a custom tool (e.g., setupMoneyTab), then referenced by NL in other tests’ setup.
Naming Glossary¶
| Term | What it is |
|---|---|
trailhead |
Trail identity, configuration, and setup — where the trail begins |
setup |
Setup steps within the trailhead (checkpoint for recording iteration) |
trail |
Test steps — the path you walk (the test) |
step |
Individual action within setup or trail |
recording |
Ephemeral derived cache for a step (deterministic replay, replaceable) |
tools |
Directly authored deterministic block — hand-written, not derived from a step |
| blazing | AI exploration when no recording exists (verb, not keyword) |
blaze.yaml |
NL definition file — the plan before you go |
*.trail.yaml |
Platform recording file — the trail left behind |
memory |
Pre-seeded variables for template interpolation |
systemPrompt |
Text injected into LLM system prompt |
Future: Execution Mode¶
A future enhancement will add a mode property to trailhead that controls the speed/accuracy tradeoff:
trailhead:
mode: fast # fast | accurate | custom
| Mode | View Hierarchy | Tool Sets | Target |
|---|---|---|---|
fast |
Filtered, minimal nodes | Core tools only | Local LLMs, small context windows |
accurate |
Full hierarchy, all nodes/bounds | All available tools | Frontier models, max reliability |
custom |
Explicit per-trail config | Explicit per-trail config | Fine-tuned control |
This is intentionally deferred — the right API will emerge from real usage with local vs. frontier models.
Migration Strategy¶
- Build a new v2 parser alongside the existing one in
trailblaze-models/commonMain. - Try-catch fallback: attempt v2 parsing first, fall back to v1 on failure.
- Bulk migrate all
.trail.yamlandblaze.yamlfiles once v2 is stable. - Delete v1 parser after migration is complete.
v1 → v2 Mapping¶
| v1 | v2 |
|---|---|
- config: (list item) |
trailhead: (mapping key) |
- prompts: (list item, multiple allowed) |
trail: (mapping key, exactly one) |
- tools: (list item, standalone top-level) |
- tools: (entry in setup/trail lists) |
step: + recording: tools: |
step: + recording: |
verify: |
step: (with assertion tools) |
context: (in config) |
systemPrompt: (in trailhead) |
| multiple interleaved blocks | single trailhead + trail |
What changed¶
Positive:
- Two clearly distinct sections — trailhead (starting point) and trail (the test)
- Setup as a checkpoint within trailhead enables recording iteration and deterministic setup replay
- trailhead semantically groups identity + config + setup as “everything about the starting point”
- Flat recording syntax (one fewer nesting level)
- File is a mapping — simpler than a list when there’s one of each section
- Clear semantic boundary: step is source of truth, recording is ephemeral cache
- tools blocks preserved as a hand-authored deterministic primitive alongside step entries
- setup and trail share the same authoring model (mixed step and tools)
- Structured variable support via memory
- systemPrompt removes confusion about where context text is used
- Removing verify simplifies the model — one fewer concept to learn
- tools as a first-class primitive gives authors an escape hatch for deterministic sequences without forcing NL wrapping
- Foundation for future mode-based execution configuration
- Setup failure vs trail failure distinction improves test reporting
Negative:
- All existing .trail.yaml and blaze.yaml files must be migrated (mitigated by try-catch fallback period)
- External tools/scripts that parse trail files need updating
- Two parsers coexist temporarily during migration