Trailblaze Decision 008: Trailblaze MCP¶

Context¶

Trailblaze provides LLM-driven UI automation for mobile applications.

Historically, single-agent approaches to UI automation required the agent to maintain screen state (view hierarchies, screenshots) within its own conversation. This caused two problems:

Context window bloat: Each step added more screen state to the conversation, eventually exhausting the context limit
LLM confusion: Multiple screen states in the same conversation led to the model reasoning about outdated UI or conflating different screens

Trailblaze addresses this with a subagent architecture: each step is handled by a fresh agent conversation that only receives the current screen state. The orchestrating layer maintains continuity while subagents operate statelessly on the latest UI.

External and internal teams have expressed interest in integrating with Trailblaze via MCP for device control:

Block mobile engineers and Firebender: Automate mobile UI interactions to remove human-in-the-loop friction during development—typically throwaway trails for quick validation of a flow
Test authoring, execution, and infrastructure: Enable developers and QE to create, run, and manage persistent UI tests that run continuously
General device control: Provide MCP-based mobile device control for any agent or tool that needs to interact with mobile applications

A key principle: author once, run deterministically. While the subagent approach is used during initial authoring (exploring the UI, figuring out the right steps), the result is a recorded trail. Subsequent runs use the trail deterministically without LLM reasoning—fast, predictable, and cost-free.

Trail recording works through sessions: a new session starts automatically when interactions begin, and everything within that session is recorded. Users explicitly indicate when they want to finalize a trail from their actions, allowing them to review in the Trailblaze desktop app before sending it for automated execution.

Trail storage: Trails are persisted as trail.yaml files on disk. At Block, trails are stored in a dedicated directory and referenced by path. For the internal test infrastructure, if a trail doesn’t exist on disk, it can be generated from natural language via the TestTrail system.

AI fallback can recover from trail failures due to UI changes, but is disabled by default. This preserves determinism and avoids LLM costs. When a trail step fails, Trailblaze reports the failure to the MCP client, which can then decide whether to invoke AI-assisted recovery using natural language prompts.

Custom tools are a key benefit of Trailblaze. By specifying a target app, teams get access to app-specific tools that expose functionality beyond standard UI interactions. For example, an app target can provide a tool for quickly logging into staging or test accounts, providing the same access as debug menus without navigating through the UI.

The Model Context Protocol (MCP) provides a standardized interface for exposing Trailblaze capabilities to external AI systems. Trailblaze uses the Streamable HTTP transport, which allows MCP clients to connect via HTTP POST requests to a session-based endpoint. See the MCP setup guide for connection details.

Decision¶

Introduce a Trailblaze MCP server with multiple modes that support different integration patterns. Tools are dynamically registered based on the current mode, and clients can switch modes during a session.

Operating Modes¶

The modes are defined by two questions: 1. Who is the agent? (Who decides what actions to take) 2. Where does the LLM come from? (Who provides the “brain”)

Mode 1: `MCP_CLIENT_LIKE_GOOSE_AS_AGENT` (Dumb Tools)¶

Aspect	Value
Who’s the agent	MCP client (e.g., Goose, Firebender)
LLM source	MCP client’s LLM
Trailblaze exposes	Primitive tools only (`tap`, `swipe`, `inputText`, `getScreenshot`, `viewHierarchy`)
Trailblaze role	Dumb tool executor - no reasoning

Goose: "I see login button" → tap(150, 300)
Goose: "I see text field" → inputText("username")
Goose: "I see password field" → inputText("password")

Trailblaze is completely dumb. Just executes what the MCP client tells it.

Mode 2: `TRAILBLAZE_AGENT_WHILE_LOOP` (Local LLM)¶

Aspect	Value
Who’s the agent	Trailblaze
LLM source	Trailblaze’s local LLM (configured provider)
Trailblaze exposes	`runPrompt()` only
Trailblaze role	Full agent - does all reasoning and execution

Goose: runPrompt("login to the app")
Trailblaze: *thinks using configured LLM* → tap → type → tap → done
Goose: *waits, gets result*

The MCP client just kicks off the task. Trailblaze does everything internally.

Mode 3: `MCP_CLIENT_LIKE_GOOSE_WITH_SAMPLING` (Tunneled LLM)¶

Aspect	Value
Who’s the agent	MCP client (high-level) + Trailblaze (low-level execution)
LLM source	MCP client’s LLM (tunneled via MCP Sampling)
Trailblaze exposes	High-level tools (`runPrompt`, `switchToolSet`)
Trailblaze role	Sub-agent that borrows MCP client’s brain

Goose: runPrompt("tap the login button")  ← Goose decides WHAT task
Trailblaze: *needs to think* → asks Goose via sampling: "where is login button?"
Goose's LLM: "it's at (150, 300)"
Trailblaze: *taps* → returns result
Goose: runPrompt("enter username sam")    ← Goose decides NEXT task

Goose drives the conversation (decides what tasks to do next). Trailblaze borrows Goose’s brain for the low-level “how” decisions via MCP Sampling.

Mode 4: `TRAILBLAZE_AGENT_RECURSIVE_MCP` (Future - Self-Connection)¶

Aspect	Value
Who’s the agent	Trailblaze
LLM source	Trailblaze’s local LLM (configured provider)
Trailblaze exposes	`runPrompt()` only
Trailblaze role	Full agent that calls its OWN MCP tools

Goose: runPrompt("login to the app")
Trailblaze Agent: *thinks using local LLM*
Trailblaze Agent: → calls tap() via MCP (to itself!)
Trailblaze Agent: → calls inputText() via MCP (to itself!)
Trailblaze Agent: → done, returns to Goose

Same external interface as Mode 2, but internally the agent uses MCP for tool execution (self-connection). This creates architectural symmetry - external MCP clients and internal agent use the exact same tool interface.

Mode Summary Table¶

Mode	Agent	LLM Source	Trailblaze Role	Status
`MCP_CLIENT_LIKE_GOOSE_AS_AGENT`	MCP client	MCP client	Dumb tool executor	✅ Implemented
`TRAILBLAZE_AGENT_WHILE_LOOP`	Trailblaze	Local (configured LLM)	Full agent	✅ Implemented
`MCP_CLIENT_LIKE_GOOSE_WITH_SAMPLING`	MCP client + Trailblaze	MCP client (tunneled)	Sub-agent, borrows brain	✅ Implemented
`TRAILBLAZE_AGENT_RECURSIVE_MCP`	Trailblaze	Local (configured LLM)	Full agent via self-MCP	🔮 Future

Note on deterministic execution: Trail recording/playback is orthogonal to these modes. If a trail has recordings, it runs deterministically without LLM calls regardless of mode.

Session State¶

The MCP server is single-tenant—one session controls one device at a time. Settings like target device, target app, and platform are retained within a session and persist across reconnections.

Dynamic Tool Management¶

Tools change based on multiple dimensions: - Mode: Switching between modes changes available tools - Target app: App-specific tools for your configured app targets - Target platform: iOS vs Android may expose different capabilities - Tool categories: Subagents can dynamically swap toolsets to reduce context window usage

Users can configure settings via the Trailblaze desktop app or via MCP tools (e.g., setMode, setTargetApp).

Scope¶

This design assumes local device control: the MCP server runs on the same machine as the MCP client, with devices connected directly via ADB (Android) or as physical/simulated iOS devices. Remote device farms and cloud-based device provisioning are out of scope.

Consequences¶

Positive: - Single MCP server supports multiple integration patterns - Client Agent mode requires no Trailblaze LLM configuration - Dynamic mode switching enables seamless transitions between execution and authoring - MCP Sampling enables the subagent pattern, preventing context window exhaustion - Deterministic trail execution by default keeps costs low and behavior predictable

Negative: - TRAILBLAZE_AS_AGENT mode requires LLM configuration - MCP_CLIENT_AS_AGENT mode with subagent orchestration requires clients that support MCP Sampling - Single-tenant design limits to one device per session

Implementation Summary¶

What Was Built¶

Component	Description
Session Configuration	`TrailblazeMcpMode`, `ScreenshotFormat`, `ViewHierarchyVerbosity`, `LlmCallStrategy` enums and configurable `TrailblazeMcpSessionContext`
Session Config Tools	`getSessionConfig`, `setMode`, `setScreenshotFormat`, `setAutoIncludeScreenshot`, `setViewHierarchyVerbosity`, `setLlmCallStrategy`, `configureSession` - all use enum parameters directly for type safety
Dynamic Tool Categories	`ToolSetCategory` enum with `DynamicToolSetManager` for per-session tool state
Tool Management Tools	`listToolCategories`, `enableToolCategories`, `addToolCategory`, `removeToolCategory`, `focusOnCategory`, plus presets (`useMinimalTools`, `useStandardTools`, `useTestingTools`)
MCP Sampling Support	`McpSamplingClient` using MCP Kotlin SDK’s `ServerSession.createMessage()` and `SubagentOrchestrator` for multi-step automation
Progress Notifications	`McpProgressNotifier` bridges LogsRepo events to MCP progress notifications
Multi-Session Support	New transport + MCP server instance per client, allowing simultaneous connections
Bridge Entry Point	`runYamlBlocking()` method encapsulates MCP-specific blocking execution with progress callbacks
Cancellation Propagation	MCP session lifecycle wired to automation cancellation
MCP Tool Executor	`McpToolExecutor` interface with `DirectMcpToolExecutor` for in-process tool execution
Dual Sampling Source	`SamplingSource` interface with `LocalLlmSamplingSource`, `McpClientSamplingSource`, and `SamplingSourceResolver`
Koog MCP Agent	`KoogMcpAgent` using Koog’s native `AIAgent` with MCP tools via self-connection
LLM Call Strategy	`LlmCallStrategy` enum (DIRECT/MCP_SAMPLING) for selecting how LLM API calls are made
Agent Metrics	`AgentMetricsCollector` tracking success/failure rates, `getAgentMetrics` and `clearAgentMetrics` tools
LLM Wiring	Optional `llmClientProvider` and `llmModelProvider` in `TrailblazeMcpServer` for local LLM fallback

Type-Safe Enum Parameters¶

MCP tool parameters use enum types directly instead of strings. Koog and the MCP SDK serialize enums automatically via kotlinx.serialization.

Benefits: - LLM visibility: Enum values are enumerated in the tool schema, so LLMs see all valid options - Type safety: No runtime parsing errors from invalid string values - Cleaner code: No fromString() boilerplate in enum companions

Example: setMode(mode: TrailblazeMcpMode) instead of setMode(mode: String). The LLM sees the schema includes MCP_CLIENT_AS_AGENT and TRAILBLAZE_AS_AGENT as valid values.

Two-Tier Tool Management Pattern¶

For subagents to reduce context window usage:

Parent LLM selects initial tool categories based on the high-level task
Subagent can swap categories as it discovers what it needs

This reduces context window usage by 50-80% compared to exposing all tools.

MCP Logging Infrastructure¶

Structured TrailblazeLog events for MCP agent operations, enabling visibility in the Trailblaze desktop app and debugging:

Log Type	Purpose
`McpAgentRunLog`	Full agent run lifecycle - objective, transport mode, iteration count, final result
`McpAgentIterationLog`	Per-iteration details - iteration number, LLM completion, tool called, result
`McpSamplingLog`	LLM completion requests - messages, model, tokens, duration, strategy
`McpAgentToolLog`	Tool execution - tool name, arguments, result, duration, transport mode

All log types use enum types (AgentToolTransport, LlmCallStrategy) for type safety, defined in trailblaze-models so logs can reference them.

Known Limitations¶

MCP Sampling: Most MCP clients (Cursor, Firebender) don’t support sampling/createMessage. Goose does support it. Use TRAILBLAZE_AS_AGENT mode with DIRECT LLM strategy as the recommended fallback.
Manual Refresh Required After Server Restart: Sessions are in-memory only. Trailblaze returns HTTP 404 per MCP spec, but Cursor/Firebender don’t auto-reconnect (known client bug). Manual refresh is required.
Single Device Per Session: Each MCP session controls one device at a time.

Future Direction: Two-Tier Agent Architecture¶

See Decision 025: Two-Tier Agent Architecture for the next evolution of agent design.

The two-tier architecture separates concerns: - Outer Agent (MCP client like Goose, or Koog in standalone): Planning, replanning, cross-system orchestration - Inner Agent (Trailblaze): Screen understanding, action recommendation, device execution

This enables model specialization (cheap vision model for screen analysis, expensive reasoning model for planning) and cross-system testing where the outer agent coordinates mobile UI + filesystem + database + API verification.

Architecture: TrailblazeMcpBridge¶

TrailblazeMcpBridgeImpl is the primary entry point for all MCP-specific operations, bridging MCP’s request/response model and Trailblaze’s internal async architecture.

Aspect	Desktop UI	MCP
Execution model	Fire-and-forget	Must block until completion
Progress	Shown in UI	Streamed as MCP notifications
Session continuity	UI maintains state	Bridge manages per-device sessions
Cancellation	User clicks Stop	MCP session close triggers cancellation

Bridge Responsibilities: - Device selection and session management - YAML execution (runYaml() fire-and-forget, runYamlBlocking() for MCP) - Screen state access and tool execution - Cancellation propagation