Trailblaze Decision 008: Trailblaze MCP¶
Context¶
Trailblaze provides LLM-driven UI automation for mobile applications.
Historically, single-agent approaches to UI automation required the agent to maintain screen state (view hierarchies, screenshots) within its own conversation. This caused two problems:
- Context window bloat: Each step added more screen state to the conversation, eventually exhausting the context limit
- LLM confusion: Multiple screen states in the same conversation led to the model reasoning about outdated UI or conflating different screens
Trailblaze addresses this with a subagent architecture: each step is handled by a fresh agent conversation that only receives the current screen state. The orchestrating layer maintains continuity while subagents operate statelessly on the latest UI.
External and internal teams have expressed interest in integrating with Trailblaze via MCP for device control:
- Block mobile engineers and Firebender: Automate mobile UI interactions to remove human-in-the-loop friction during development—typically throwaway trails for quick validation of a flow
- Test authoring, execution, and infrastructure: Enable developers and QE to create, run, and manage persistent UI tests that run continuously
- General device control: Provide MCP-based mobile device control for any agent or tool that needs to interact with mobile applications
A key principle: author once, run deterministically. While the subagent approach is used during initial authoring (exploring the UI, figuring out the right steps), the result is a recorded trail. Subsequent runs use the trail deterministically without LLM reasoning—fast, predictable, and cost-free.
Trail recording works through sessions: a new session starts automatically when interactions begin, and everything within that session is recorded. Users explicitly indicate when they want to finalize a trail from their actions, allowing them to review in the Trailblaze desktop app before sending it for automated execution.
Trail storage: Trails are persisted as trail.yaml files on disk. At Block, trails are stored in a dedicated directory and referenced by path. For the internal test infrastructure, if a trail doesn’t exist on disk, it can be generated from natural language via the TestTrail system.
AI fallback can recover from trail failures due to UI changes, but is disabled by default. This preserves determinism and avoids LLM costs. When a trail step fails, Trailblaze reports the failure to the MCP client, which can then decide whether to invoke AI-assisted recovery using natural language prompts.
Custom tools are a key benefit of Trailblaze. By specifying a target app, teams get access to app-specific tools that expose functionality beyond standard UI interactions. For example, an app target can provide a tool for quickly logging into staging or test accounts, providing the same access as debug menus without navigating through the UI.
The Model Context Protocol (MCP) provides a standardized interface for exposing Trailblaze capabilities to external AI systems. Trailblaze uses the Streamable HTTP transport, which allows MCP clients to connect via HTTP POST requests to a session-based endpoint. See the MCP setup guide for connection details.
Decision¶
Introduce a Trailblaze MCP server with multiple modes that support different integration patterns. Tools are dynamically registered based on the current mode, and clients can switch modes during a session.
Operating Modes¶
The modes are defined by two questions: 1. Who is the agent? (Who decides what actions to take) 2. Where does the LLM come from? (Who provides the “brain”)
Mode 1: MCP_CLIENT_LIKE_GOOSE_AS_AGENT (Dumb Tools)¶
| Aspect | Value |
|---|---|
| Who’s the agent | MCP client (e.g., Goose, Firebender) |
| LLM source | MCP client’s LLM |
| Trailblaze exposes | Primitive tools only (tap, swipe, inputText, getScreenshot, viewHierarchy) |
| Trailblaze role | Dumb tool executor - no reasoning |
Goose: "I see login button" → tap(150, 300)
Goose: "I see text field" → inputText("username")
Goose: "I see password field" → inputText("password")
Trailblaze is completely dumb. Just executes what the MCP client tells it.
Mode 2: TRAILBLAZE_AGENT_WHILE_LOOP (Local LLM)¶
| Aspect | Value |
|---|---|
| Who’s the agent | Trailblaze |
| LLM source | Trailblaze’s local LLM (configured provider) |
| Trailblaze exposes | runPrompt() only |
| Trailblaze role | Full agent - does all reasoning and execution |
Goose: runPrompt("login to the app")
Trailblaze: *thinks using configured LLM* → tap → type → tap → done
Goose: *waits, gets result*
The MCP client just kicks off the task. Trailblaze does everything internally.
Mode 3: MCP_CLIENT_LIKE_GOOSE_WITH_SAMPLING (Tunneled LLM)¶
| Aspect | Value |
|---|---|
| Who’s the agent | MCP client (high-level) + Trailblaze (low-level execution) |
| LLM source | MCP client’s LLM (tunneled via MCP Sampling) |
| Trailblaze exposes | High-level tools (runPrompt, switchToolSet) |
| Trailblaze role | Sub-agent that borrows MCP client’s brain |
Goose: runPrompt("tap the login button") ← Goose decides WHAT task
Trailblaze: *needs to think* → asks Goose via sampling: "where is login button?"
Goose's LLM: "it's at (150, 300)"
Trailblaze: *taps* → returns result
Goose: runPrompt("enter username sam") ← Goose decides NEXT task
Goose drives the conversation (decides what tasks to do next). Trailblaze borrows Goose’s brain for the low-level “how” decisions via MCP Sampling.
Mode 4: TRAILBLAZE_AGENT_RECURSIVE_MCP (Future - Self-Connection)¶
| Aspect | Value |
|---|---|
| Who’s the agent | Trailblaze |
| LLM source | Trailblaze’s local LLM (configured provider) |
| Trailblaze exposes | runPrompt() only |
| Trailblaze role | Full agent that calls its OWN MCP tools |
Goose: runPrompt("login to the app")
Trailblaze Agent: *thinks using local LLM*
Trailblaze Agent: → calls tap() via MCP (to itself!)
Trailblaze Agent: → calls inputText() via MCP (to itself!)
Trailblaze Agent: → done, returns to Goose
Same external interface as Mode 2, but internally the agent uses MCP for tool execution (self-connection). This creates architectural symmetry - external MCP clients and internal agent use the exact same tool interface.
Mode Summary Table¶
| Mode | Agent | LLM Source | Trailblaze Role | Status |
|---|---|---|---|---|
MCP_CLIENT_LIKE_GOOSE_AS_AGENT |
MCP client | MCP client | Dumb tool executor | ✅ Implemented |
TRAILBLAZE_AGENT_WHILE_LOOP |
Trailblaze | Local (configured LLM) | Full agent | ✅ Implemented |
MCP_CLIENT_LIKE_GOOSE_WITH_SAMPLING |
MCP client + Trailblaze | MCP client (tunneled) | Sub-agent, borrows brain | ✅ Implemented |
TRAILBLAZE_AGENT_RECURSIVE_MCP |
Trailblaze | Local (configured LLM) | Full agent via self-MCP | 🔮 Future |
Note on deterministic execution: Trail recording/playback is orthogonal to these modes. If a trail has recordings, it runs deterministically without LLM calls regardless of mode.
Session State¶
The MCP server is single-tenant—one session controls one device at a time. Settings like target device, target app, and platform are retained within a session and persist across reconnections.
Dynamic Tool Management¶
Tools change based on multiple dimensions: - Mode: Switching between modes changes available tools - Target app: App-specific tools for your configured app targets - Target platform: iOS vs Android may expose different capabilities - Tool categories: Subagents can dynamically swap toolsets to reduce context window usage
Users can configure settings via the Trailblaze desktop app or via MCP tools (e.g., setMode, setTargetApp).
Scope¶
This design assumes local device control: the MCP server runs on the same machine as the MCP client, with devices connected directly via ADB (Android) or as physical/simulated iOS devices. Remote device farms and cloud-based device provisioning are out of scope.
Consequences¶
Positive: - Single MCP server supports multiple integration patterns - Client Agent mode requires no Trailblaze LLM configuration - Dynamic mode switching enables seamless transitions between execution and authoring - MCP Sampling enables the subagent pattern, preventing context window exhaustion - Deterministic trail execution by default keeps costs low and behavior predictable
Negative: - TRAILBLAZE_AS_AGENT mode requires LLM configuration - MCP_CLIENT_AS_AGENT mode with subagent orchestration requires clients that support MCP Sampling - Single-tenant design limits to one device per session
Implementation Summary¶
What Was Built¶
| Component | Description |
|---|---|
| Session Configuration | TrailblazeMcpMode, ScreenshotFormat, ViewHierarchyVerbosity, LlmCallStrategy enums and configurable TrailblazeMcpSessionContext |
| Session Config Tools | getSessionConfig, setMode, setScreenshotFormat, setAutoIncludeScreenshot, setViewHierarchyVerbosity, setLlmCallStrategy, configureSession - all use enum parameters directly for type safety |
| Dynamic Tool Categories | ToolSetCategory enum with DynamicToolSetManager for per-session tool state |
| Tool Management Tools | listToolCategories, enableToolCategories, addToolCategory, removeToolCategory, focusOnCategory, plus presets (useMinimalTools, useStandardTools, useTestingTools) |
| MCP Sampling Support | McpSamplingClient using MCP Kotlin SDK’s ServerSession.createMessage() and SubagentOrchestrator for multi-step automation |
| Progress Notifications | McpProgressNotifier bridges LogsRepo events to MCP progress notifications |
| Multi-Session Support | New transport + MCP server instance per client, allowing simultaneous connections |
| Bridge Entry Point | runYamlBlocking() method encapsulates MCP-specific blocking execution with progress callbacks |
| Cancellation Propagation | MCP session lifecycle wired to automation cancellation |
| MCP Tool Executor | McpToolExecutor interface with DirectMcpToolExecutor for in-process tool execution |
| Dual Sampling Source | SamplingSource interface with LocalLlmSamplingSource, McpClientSamplingSource, and SamplingSourceResolver |
| Koog MCP Agent | KoogMcpAgent using Koog’s native AIAgent with MCP tools via self-connection |
| LLM Call Strategy | LlmCallStrategy enum (DIRECT/MCP_SAMPLING) for selecting how LLM API calls are made |
| Agent Metrics | AgentMetricsCollector tracking success/failure rates, getAgentMetrics and clearAgentMetrics tools |
| LLM Wiring | Optional llmClientProvider and llmModelProvider in TrailblazeMcpServer for local LLM fallback |
Type-Safe Enum Parameters¶
MCP tool parameters use enum types directly instead of strings. Koog and the MCP SDK serialize enums automatically via kotlinx.serialization.
Benefits:
- LLM visibility: Enum values are enumerated in the tool schema, so LLMs see all valid options
- Type safety: No runtime parsing errors from invalid string values
- Cleaner code: No fromString() boilerplate in enum companions
Example: setMode(mode: TrailblazeMcpMode) instead of setMode(mode: String). The LLM sees the schema includes MCP_CLIENT_AS_AGENT and TRAILBLAZE_AS_AGENT as valid values.
Two-Tier Tool Management Pattern¶
For subagents to reduce context window usage:
- Parent LLM selects initial tool categories based on the high-level task
- Subagent can swap categories as it discovers what it needs
This reduces context window usage by 50-80% compared to exposing all tools.
MCP Logging Infrastructure¶
Structured TrailblazeLog events for MCP agent operations, enabling visibility in the Trailblaze desktop app and debugging:
| Log Type | Purpose |
|---|---|
McpAgentRunLog |
Full agent run lifecycle - objective, transport mode, iteration count, final result |
McpAgentIterationLog |
Per-iteration details - iteration number, LLM completion, tool called, result |
McpSamplingLog |
LLM completion requests - messages, model, tokens, duration, strategy |
McpAgentToolLog |
Tool execution - tool name, arguments, result, duration, transport mode |
All log types use enum types (AgentToolTransport, LlmCallStrategy) for type safety, defined in trailblaze-models so logs can reference them.
Known Limitations¶
-
MCP Sampling: Most MCP clients (Cursor, Firebender) don’t support
sampling/createMessage. Goose does support it. Use TRAILBLAZE_AS_AGENT mode with DIRECT LLM strategy as the recommended fallback. -
Manual Refresh Required After Server Restart: Sessions are in-memory only. Trailblaze returns HTTP 404 per MCP spec, but Cursor/Firebender don’t auto-reconnect (known client bug). Manual refresh is required.
-
Single Device Per Session: Each MCP session controls one device at a time.
Future Direction: Two-Tier Agent Architecture¶
See Decision 025: Two-Tier Agent Architecture for the next evolution of agent design.
The two-tier architecture separates concerns: - Outer Agent (MCP client like Goose, or Koog in standalone): Planning, replanning, cross-system orchestration - Inner Agent (Trailblaze): Screen understanding, action recommendation, device execution
This enables model specialization (cheap vision model for screen analysis, expensive reasoning model for planning) and cross-system testing where the outer agent coordinates mobile UI + filesystem + database + API verification.
Architecture: TrailblazeMcpBridge¶
TrailblazeMcpBridgeImpl is the primary entry point for all MCP-specific operations, bridging MCP’s request/response model and Trailblaze’s internal async architecture.
| Aspect | Desktop UI | MCP |
|---|---|---|
| Execution model | Fire-and-forget | Must block until completion |
| Progress | Shown in UI | Streamed as MCP notifications |
| Session continuity | UI maintains state | Bridge manages per-device sessions |
| Cancellation | User clicks Stop | MCP session close triggers cancellation |
Bridge Responsibilities:
- Device selection and session management
- YAML execution (runYaml() fire-and-forget, runYamlBlocking() for MCP)
- Screen state access and tool execution
- Cancellation propagation