Trailblaze Decision 029: Custom Tool Architecture¶
Context¶
Decision 010: Custom Tool Authoring documented the current Kotlin-based approach and its limitations. Today, custom tools for Trailblaze (e.g., myapp_launchAppSignedIn, otherapp_scrollUntilTextIsVisible) must be:
- Written in Kotlin with access to
TrailblazeToolExecutionContext - Compiled into the Trailblaze distribution or test APK
- Forked if you’re an external team wanting custom behavior
This creates a barrier for external adoption:
- External teams (Acme, ExampleCorp, etc.) can’t extend Trailblaze without forking
- Non-Kotlin teams (Python, TypeScript) have no path to custom tools
- The current tool API is tightly coupled to internal implementation details
Requirements¶
Based on discussions with potential adopters and internal teams:
- Teams MUST be able to add tools without forking — non-negotiable
- Broad accessibility — not just Kotlin developers; Python, TypeScript, Go teams need a path
- Type safety and refactoring support — we don’t want to maintain an untyped API surface
- On-device support — device farms (Firebase Test Lab, AWS Device Farm) only allow app APK + test APK
- Future extensibility — web (Playwright), desktop control are on the roadmap
- Cross-platform tools — one tool should work across Android, iOS, Web where possible
- Tool set management — ability to filter/group tools to reduce LLM context window
Decision¶
Implement a multi-path custom tool architecture with:
- Wire/Proto-defined API — Kotlin-first with Wire, generates proto for external consumption
- MCP for tool discovery — external tools run their own MCP servers
- RPC for execution — host-driven tools call Trailblaze via generated clients
- Library dependency for on-device — Kotlin tools compiled into test APK
- Central tool registry — metadata exposed via MCP resources
- Layered command interfaces — core + platform-specific backends
Architecture Overview¶
┌─────────────────────────────────────────────────────────────────────┐
│ Outer Agent │
│ (Claude, Goose, Cursor, Desktop App) │
└─────────────────────────────────────────────────────────────────────┘
│ │ │ │
│ MCP │ MCP │ MCP │ MCP
▼ ▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Trailblaze │ │ MyApp Tools │ │OtherApp Tools │ │ Acme Tools │
│ (core) │ │ (example) │ │ (example) │ │ (External) │
│ │ │ │ │ │ │ │
│ Primitives: │ │ App-specific: │ │ App-specific: │ │ App-specific: │
│ - tap │ │ - login │ │ - transfer │ │ - login │
│ - inputText │ │ - checkout │ │ - banking │ │ - acceptRide │
│ - launchApp │ │ - cardReader │ │ │ │ │
│ │ │ │ │ │ │ │
│ Registry: ✓ │ │ Registry: ✓ │ │ Registry: ✓ │ │ Registry: ✓ │
└───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘
│ │ │ │
└───────────────┴───────┬───────┴───────────────┘
│
TrailblazeCommands (RPC)
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Trailblaze RPC Server │
│ (localhost:52525) │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ TrailblazeCommands (core interface) │ │
│ │ - tap(), inputText(), launchApp(), clearAppData() │ │
│ │ - captureScreen(), waitUntilVisible(), assertVisible() │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Maestro │ │ Playwright│ │ Desktop │ │
│ │ Commands │ │ Commands │ │ Commands │ │
│ │ │ │ │ │ (future) │ │
│ │ - adb.* │ │ - navigate│ │ │ │
│ │ - swipe │ │ - fill │ │ │ │
│ │ - scroll │ │ - click │ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │ │ │ │
└────────┼─────────────────────┼─────────────────────┼───────────────┘
│ │ │
▼ ▼ ▼
Mobile Device Web Browser Desktop App
Wire/Proto: Kotlin-First API Definition¶
We use Wire (a Kotlin-first proto library by Block) to define the API in Kotlin and generate proto for external consumption:
// Defined in Kotlin using Wire annotations
// Wire generates proto schema for interop with other languages
interface TrailblazeCommands {
val platform: TrailblazeDevicePlatform
// ═══════════════════════════════════════════════════════════════
// ACTIONS (perform operations)
// ═══════════════════════════════════════════════════════════════
suspend fun tap(text: String? = null, id: String? = null, index: Int? = null): ActionResult
suspend fun inputText(text: String): ActionResult
suspend fun launchApp(packageId: String): ActionResult
suspend fun clearAppData(packageId: String): ActionResult
// ═══════════════════════════════════════════════════════════════
// QUERIES (return values for conditionals - don't throw on "not found")
// ═══════════════════════════════════════════════════════════════
suspend fun isVisible(text: String? = null, id: String? = null, timeoutMs: Long? = null): Boolean
suspend fun hasText(text: String): Boolean
suspend fun getElementText(id: String): String?
suspend fun getElementCount(text: String? = null, id: String? = null): Int
suspend fun captureScreen(): ScreenState
// ═══════════════════════════════════════════════════════════════
// ASSERTIONS (throw/fail if condition not met)
// ═══════════════════════════════════════════════════════════════
suspend fun assertVisible(text: String? = null, id: String? = null, timeoutMs: Long? = null): ActionResult
suspend fun waitUntilVisible(text: String, timeoutMs: Long): ActionResult
// Platform-specific backends (nullable, check availability)
val maestro: MaestroCommands? // Mobile (Android/iOS)
val playwright: PlaywrightCommands? // Web
val desktop: DesktopCommands? // Desktop (future)
}
// Mobile-specific (Maestro backend)
interface MaestroCommands {
// Android
suspend fun adbShell(command: String): ShellResult
suspend fun grantPermission(packageId: String, permission: String): ActionResult
suspend fun pressBack(): ActionResult
suspend fun pressHome(): ActionResult
// Gestures
suspend fun swipe(direction: SwipeDirection, durationMs: Long? = null): ActionResult
suspend fun scroll(direction: ScrollDirection, amount: Int? = null): ActionResult
suspend fun pinch(scale: Float): ActionResult
}
// Web-specific (Playwright backend)
interface PlaywrightCommands {
suspend fun navigate(url: String): ActionResult
suspend fun click(selector: String): ActionResult
suspend fun fill(selector: String, value: String): ActionResult
suspend fun waitForSelector(selector: String, timeoutMs: Long? = null): ActionResult
suspend fun evaluateJs(script: String): Any?
}
Code Generation¶
From the Wire/Kotlin definitions, we generate:
| Generated Artifact | Language | Usage |
|---|---|---|
TrailblazeCommands interface |
Kotlin | In-process (host + on-device) |
trailblaze-api.proto |
Proto | External language interop |
TrailblazeClient |
Python | External MCP servers |
TrailblazeClient |
TypeScript | External MCP servers |
| gRPC/Connect stubs | Multiple | RPC communication |
Deployment Paths¶
Path 1: Host-Driven (MCP + RPC)¶
Audience: Most external users (Python, TypeScript, Go teams)
External teams write their own MCP server that: 1. Exposes custom tools via MCP protocol 2. Uses generated RPC client to call Trailblaze commands 3. Runs as a separate process
# acme_tools_server.py
from mcp import Server
from trailblaze_client import TrailblazeClient # Generated from proto
tb = TrailblazeClient("localhost:52525")
server = Server()
@server.tool("acme_driver_login")
async def login(email: str, password: str) -> dict:
"""Log in to Acme Driver app with credentials."""
await tb.clear_app_data(package_id="com.acme.driver")
await tb.launch_app(package_id="com.acme.driver")
await tb.tap(text="Sign in")
await tb.input_text(text=email)
await tb.input_text(text=password)
await tb.tap(text="Submit")
await tb.wait_until_visible(text="Go Online", timeout_ms=10000)
return {"success": True}
if __name__ == "__main__":
server.run()
Path 2: On-Device (Library + Custom APK)¶
Audience: Teams running tests on device farms (Firebase Test Lab, AWS Device Farm)
Teams depend on trailblaze-android-ondevice-mcp and compile their tools into a test APK:
// AcmeDriverLoginTool.kt
class AcmeDriverLoginTool(
private val commands: TrailblazeCommands // Same interface as RPC!
) : TrailblazeTool {
override suspend fun execute(args: Map<String, Any>): ToolResult {
commands.clearAppData("com.acme.driver")
commands.launchApp("com.acme.driver")
commands.tap(text = "Sign in")
commands.inputText(args["email"] as String)
commands.inputText(args["password"] as String)
commands.tap(text = "Submit")
commands.waitUntilVisible(text = "Go Online", timeoutMs = 10000)
return ToolResult.success()
}
}
Key insight: The TrailblazeCommands interface is identical whether backed by RPC (host) or direct calls (on-device).
Path 3: Host-Driven Kotlin (In-Process)¶
Audience: Teams already using Kotlin who want in-process tools
Kotlin tools in the same JVM use the interface directly with no RPC overhead.
MCP Server Organization¶
Composable Tool Modules¶
Tools are organized as composable modules that can be combined into MCP servers flexibly:
Tool Modules (libraries):
├── shared-common-tools # Shared across all your apps
├── myapp-tools # MyApp-specific
├── otherapp-tools # OtherApp-specific
├── admin-tools # AdminPanel-specific
└── acme-tools # External: Acme's tools
These modules are code libraries, not servers. How they’re composed into servers is a deployment decision.
Deployment Options¶
Option A: One server per app (independence)
MCP Servers:
├── myapp-server → [shared-common-tools, myapp-tools]
├── otherapp-server → [shared-common-tools, otherapp-tools]
└── admin-server → [shared-common-tools, admin-tools]
Option B: One superset server (efficiency)
MCP Servers:
└── combined-tools-server → [shared-common-tools, myapp-tools, otherapp-tools, admin-tools]
Option C: Mix based on needs
MCP Servers:
├── mobile-server → [shared-common-tools, myapp-tools, otherapp-tools] # Mobile apps together
└── admin-server → [shared-common-tools, admin-tools] # Web separate
Trade-offs¶
| Approach | Ports | Processes | Deploy Independence | Failure Isolation |
|---|---|---|---|---|
| One per app | N | N | High | High |
| One superset | 1 | 1 | Low (coordinated) | Low |
The overhead difference is minimal — a few extra processes. The bigger question is organizational: - Independence: Teams deploy their tools without coordinating - Efficiency: Single server, shared caches/state, one port to manage
Recommendation: Multiple Servers (with Build Considerations)¶
Default to one MCP server per app/team. The process overhead is minimal, and the organizational benefits are significant:
- Ownership — Each team owns their server and tools
- Independence — Deploy, update, and scale independently
- Overlap handling — Teams can have similar tools without naming conflicts
- Failure isolation — One server’s issues don’t affect others
- Easier management — Clear boundaries for what tools live where
Startup time consideration: Multiple servers mean multiple process startups.
| Server Type | Build Time | Startup Time | N Servers Impact |
|---|---|---|---|
| Python/TypeScript | None | 1-3s (imports, init) | N × startup |
| Kotlin in-process | Part of Trailblaze | Zero (same JVM) | No impact |
| Kotlin out-of-process | Gradle compile | JVM startup + init | N × build + startup |
For Python/TypeScript: No build, but starting 3 servers = 3× interpreter + import time. Usually acceptable (a few seconds total), but consider combining if startup latency matters.
For Kotlin MCP servers: Combine into one pre-built artifact with all tools, to avoid N builds at startup. The single JAR can still organize tools by namespace (myapp_*, otherapp_*). This gives:
- Build once → start fast
- Logical separation via namespacing
- Single process or spawn multiple from same artifact
Tool Namespacing (Required)¶
Namespacing is critical regardless of deployment model. Even with one combined server, tools need clear namespaces to avoid conflicts and enable filtering.
Per ADR 005: Tool Naming Convention, we use underscores (not dots) because OpenAI function names don’t support dots:
myapp_login
myapp_checkout
otherapp_transfer
otherapp_requestMoney
admin_viewAnalytics
Benefits:
- No conflicts — Multiple apps can have a login tool
- Filtering — Agent can request only myapp_* tools for a MyApp test
- Discovery — Clear ownership in tool listings
- Composability — Same naming works whether tools are in one server or many
Recommendation: One Combined Server¶
For your organization’s internal tools, consider using one combined MCP server with all app tools:
combined-tools-server
├── myapp_* (MyApp tools)
├── otherapp_* (OtherApp tools)
├── admin_* (AdminPanel tools)
└── shared_* (Shared utilities)
Why one server for your organization:
- Single build, fast startup
- Single process to manage
- Shared caches/state when useful
- Namespacing provides logical separation
- Simpler registry — one resources/read("trailblaze://registry") returns all tool metadata
Registry with One Combined Server¶
One combined server simplifies the registry endpoint:
| Deployment | Registry Calls | Aggregation |
|---|---|---|
| Multiple servers | N calls | Trailblaze aggregates |
| One combined server | 1 call | None needed |
The registry is a Map<String, ToolMetadata> where keys include the namespace:
// One registry, all tools, namespaced
val registry = mapOf(
"myapp_login" to ToolMetadata(platforms = setOf(ANDROID, IOS), groups = setOf("auth")),
"myapp_checkout" to ToolMetadata(platforms = setOf(ANDROID, IOS), groups = setOf("payment")),
"otherapp_transfer" to ToolMetadata(platforms = setOf(ANDROID, IOS), groups = setOf("transfer")),
"otherapp_requestMoney" to ToolMetadata(platforms = setOf(ANDROID, IOS), groups = setOf("transfer")),
)
Filtering by app is simple: registry.filter { it.key.startsWith("myapp_") }
External Teams: Flexible Deployment¶
External teams can deploy however they prefer: - Single-app server — One server for their app’s tools - Combined server — Multiple apps in one (following the combined server pattern) - Per-team servers — Organizational boundaries
The architecture supports all models — namespacing makes tools composable across any deployment.
Multiple MCP Servers is Standard¶
Connecting to multiple MCP servers is standard MCP usage:
- Claude Desktop, Goose, Cursor all support multiple servers
- Each server provides different capabilities
- This is the intended design pattern
Whether your team uses 1 server or 5, the architecture supports it.
MCP Server Registration¶
Transport Model: stdio vs HTTP¶
We use a simple two-tier model for MCP transport:
| Transport | Lifecycle | When to use |
|---|---|---|
| stdio | Trailblaze manages (start/stop) | Default for all managed servers |
| http | External (you manage) | Shared team servers, external infrastructure |
stdio is the default because: - No port management — communication via stdin/stdout pipes, no port conflicts - Parallel-safe — multiple instances don’t conflict (unlike ports) - CI-friendly — parallel jobs just work, no port allocation needed - Simple config — just specify the command to run
Project Configuration (trailblaze.yaml)¶
Teams configure their MCP servers in their test repository:
# trailblaze.yaml - in repo root
target: acme_driver
mcpServers:
# stdio (managed) - Trailblaze starts and stops this
- name: acme-tools
command: ./gradlew
args: [:acme-tools:runMcp]
# http (external) - already running, we just connect
- name: shared-team-tools
transport: http
url: http://team-tools.internal:8080
The “magic” experience:
$ cd acme-trailblaze-tests
$ trailblaze run
# 1. Reads trailblaze.yaml
# 2. Spawns acme-tools via Gradle (stdio)
# 3. Connects to shared-team-tools (http, already running)
# 4. Loads registries, runs tests
# 5. Stops acme-tools on exit
Config Schema¶
# trailblaze.yaml
target: string # Which app target (acme_driver, myapp, etc.)
platform: string # Optional: android, ios, web
mcpServers:
# stdio server (Trailblaze manages lifecycle)
- name: string # Identifier
command: string # Command to run
args: [string] # Arguments
workingDir: string # Optional: working directory
env: # Optional: environment variables
KEY: value
# http server (externally managed)
- name: string
transport: http # Explicitly http
url: string # URL to connect to
trailsDir: string # Optional: where to find trails (default: ./trails)
Why stdio Avoids Port Problems¶
Multiple developers on same machine:
# Developer 1
trailblaze run # Uses stdin/stdout pipe
# Developer 2 (same machine)
trailblaze run # Uses different pipe, no conflict!
CI parallel jobs:
# All jobs run simultaneously, no port allocation needed
jobs:
test-driver: { run: trailblaze run trails/driver/ }
test-rider: { run: trailblaze run trails/rider/ }
test-eats: { run: trailblaze run trails/eats/ }
STDIO Concurrency Limitation¶
Important consideration: STDIO MCP servers typically process requests sequentially — read a request, process it, respond, then read the next. A single STDIO connection cannot handle concurrent requests from multiple devices.
Recommended: HTTP Transport for Multi-Device¶
For multi-device support, Trailblaze spawns the MCP server in HTTP mode rather than STDIO:
┌───────────────────────────────────────────────────────────────────────┐
│ Trailblaze spawns HTTP MCP server (single process) │
│ │
│ $ python acme_tools_server.py --transport http --port 52530 │
│ │
│ Device 1 ──► POST /tools/call ──┐ │
│ Device 2 ──► POST /tools/call ──┼──► Concurrent handling │
│ Device 3 ──► POST /tools/call ──┘ │
└───────────────────────────────────────────────────────────────────────┘
Why HTTP is better than STDIO-per-device:
| Approach | Processes | Memory | State Sharing | Lifecycle |
|---|---|---|---|---|
| STDIO per-device | N processes | N × footprint | None (isolated) | Manage N |
| HTTP (single) | 1 process | 1 × footprint | Shared caches | Manage 1 |
Spawning one HTTP server with a controlled port is not more work than spawning N STDIO processes — and it’s cleaner:
- Single process — less overhead than N STDIO processes
- Shared state — tools can share caches, connections, loaded models
- Trailblaze controls the port — no port conflicts
- Same server code — MCP SDKs support both transports, tool author changes nothing
Invocation ID is required for HTTP transport. With concurrent requests from multiple devices hitting the same server, the invocation ID in _meta is how each request routes to the correct device:
@server.tool("analyze_screen")
async def analyze(prompt: str, ctx: Context) -> dict:
# Multiple devices calling this concurrently!
# Invocation ID tells us which device context to use
tb = TrailblazeClient.from_context(ctx)
screen = await tb.capture_screen() # Routes to correct device
return await tb.ask_llm(prompt, screen)
Single Device: STDIO Still Works¶
For the single device case (most common during development), STDIO remains simple and works without invocation ID:
mcp_servers:
acme-tools:
command: python
args: [./acme_tools_server.py]
# Single device: STDIO, no port needed
When multiple devices connect, Trailblaze automatically switches to HTTP mode, passing a port:
# Trailblaze internally does this when multi-device:
# python ./acme_tools_server.py --transport http --port 52530
For in-process Kotlin tools, concurrency is handled via coroutines in the same JVM — no transport concerns.
For External MCP Clients (Goose, Cursor, Claude Desktop)¶
Users configure their MCP client directly using each client’s config format. This is separate from trailblaze.yaml:
Goose (~/.config/goose/config.yaml):
mcp:
servers:
trailblaze:
command: trailblaze
args: [mcp]
acme-tools:
command: python
args: [./acme_tools_server.py]
In-Process (Same JVM)¶
Your app-specific tools don’t need separate MCP servers. They’re in-process Kotlin:
// In-process: tools registered directly, no MCP server spawning
class TrailblazeMcpServer {
val inProcessTools = listOf(
MyAppLoginTool::class,
MyAppCheckoutTool::class,
OtherAppTransferTool::class,
// All in-process, same JVM
)
}
External MCP servers (defined in trailblaze.yaml) are for external teams running separate processes.
Multi-Device Execution¶
Trailblaze supports multiple devices connected simultaneously. A single Trailblaze instance can orchestrate tests across an iPhone, Android emulator, and physical Android device at the same time. This is a core capability that enables:
- Cross-platform testing — Run the same test on iOS and Android in parallel
- Device farms — Scale tests across dozens of devices
- Comparative testing — Test the same flow on different device configurations
External tools don’t need to know about multi-device orchestration — device context flows through automatically.
How It Works¶
┌─────────────────────────────────────────────────────────────────────┐
│ Trailblaze Desktop App │
│ │
│ Connected devices: │
│ - Device 1 (Pixel 6, emulator-5554) │
│ - Device 2 (iPhone 14, 00008101-...) │
│ - Device 3 (Galaxy S23, RF8M...) │
│ │
│ User: "Run login test on all devices" │
│ │
│ Creates 3 parallel execution contexts: │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ deviceId: d1 │ │ deviceId: d2 │ │ deviceId: d3 │ │
│ │ platform: AND │ │ platform: IOS │ │ platform: AND │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│ │ │
│ MCP call with │ MCP call with │ MCP call with
│ device context │ device context │ device context
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ External MCP Server (Acme Tools) │
│ │
│ @server.tool("acme_driver_login") │
│ async def login(email, password, _trailblaze): │
│ # Tool doesn't know about multiple devices │
│ # It just operates in the context it's given │
│ tb = TrailblazeClient(context=_trailblaze) │
│ await tb.tap(text="Sign in") # Routed to correct device │
│ await tb.input_text(email) # Routed to correct device │
│ ... │
└─────────────────────────────────────────────────────────────────────┘
│ │ │
│ RPC with d1 │ RPC with d2 │ RPC with d3
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ Trailblaze RPC Server │
│ │
│ Routes each request to the correct device based on deviceId │
│ │
│ d1 → Pixel 6 d2 → iPhone 14 d3 → Galaxy S23 │
└─────────────────────────────────────────────────────────────────────┘
Device Context Propagation¶
When Trailblaze calls an external MCP tool, it includes device context:
{
"method": "tools/call",
"params": {
"name": "acme_driver_login",
"arguments": { "email": "test@example.com", "password": "***" },
"_trailblaze": {
"deviceId": "emulator-5554",
"sessionId": "abc123",
"platform": "android"
}
}
}
The external tool uses a client that automatically includes this context in RPC calls:
@server.tool("acme_driver_login")
async def login(email: str, password: str, _trailblaze: dict) -> dict:
# Client initialized with device context
tb = TrailblazeClient(context=_trailblaze)
# All RPC calls automatically include deviceId
await tb.clear_app_data("com.acme.driver") # → routed to emulator-5554
await tb.launch_app("com.acme.driver") # → routed to emulator-5554
await tb.tap(text="Sign in") # → routed to emulator-5554
return {"success": True}
Wire API Includes Device Context¶
The TrailblazeCommands interface includes device context in every request:
message TapRequest {
string text = 1; // Element text to match
string id = 2; // Optional: element ID
int32 index = 3; // Optional: index if multiple matches
string device_id = 4; // Routing
string session_id = 5; // Correlation
}
Tool Perspective¶
From an external tool’s perspective: - It receives a request with context - It does its work using the context - It returns a result
The tool doesn’t know or care that: - There are multiple devices - Tests are running in parallel - It’s being called multiple times simultaneously
The device context is transparent to the tool. Trailblaze handles parallelism and routing.
Invocation Context for Multi-Device¶
To support multiple devices simultaneously, every tool invocation needs execution context. When an external MCP tool makes RPC calls back to Trailblaze, those calls must route to the correct device — not just any connected device.
The invocation context is how we solve this. When Trailblaze calls an external MCP tool, it includes context in _meta:
- Invocation ID — Correlates RPC callbacks to the originating tool call
- Device info — Which device this tool invocation operates on
- Session info — Logging and analytics correlation
For in-process Kotlin tools, this context is the TrailblazeToolExecutionContext (or TrailblazeContext in TrailblazeToolSet).
For remote MCP tools, this context flows via _meta and is wrapped by the SDK’s TrailblazeClient. The client is essentially a remote execution context — every RPC call it makes is scoped to the correct device.
Metadata Shape¶
{
"_meta": {
"trailblazeInvocationId": "inv-abc123",
"trailblaze": {
"baseUrl": "http://localhost:52525",
"sessionId": "trail-xyz",
"device": {
"id": "emulator-5554",
"platform": "ANDROID",
"width": 1080,
"height": 2400
},
"capabilities": {
"sampling": true
}
}
}
}
| Field | Type | Purpose |
|---|---|---|
trailblazeInvocationId |
string | Correlates callbacks to originating request |
trailblaze.baseUrl |
string | Where to call back |
trailblaze.sessionId |
string | Trailblaze session for logging |
trailblaze.device.id |
string | Device identifier |
trailblaze.device.platform |
string | ANDROID / IOS |
trailblaze.device.width/height |
int | Screen dimensions |
trailblaze.capabilities.sampling |
bool | Whether LLM sampling is available |
Single Device Fallback¶
For single-device scenarios, invocation ID is optional. Trailblaze falls back to the single active context when: - Only one device is connected - Only one tool invocation is active
Multi-device scenarios require explicit invocation ID propagation. If multiple devices are active and no invocation ID is provided, Trailblaze returns an error explaining the requirement.
Static vs Fresh Data¶
Static metadata (included in _meta to avoid round-trips):
- Device info (platform, dimensions, ID)
- Session ID
- Callback URL
- Capabilities
Fresh data (fetched via RPC on-demand): - View hierarchy — large, changes constantly - Screenshot — large (~100KB+), stale immediately - Current screen state — tool decides when it needs fresh data
This separation ensures tools have immediate access to static context while fetching dynamic data only when needed.
Invocation ID Lifecycle¶
The invocation ID ties together a single external tool call with all the RPC requests that tool makes back to Trailblaze.
┌─────────────────────────────────────────────────────────────────────────┐
│ TRAILBLAZE │
│ │
│ 1. Trailblaze calls external MCP tool │
│ → Generate invocationId = "inv-abc123" │
│ → Store context: invocations["inv-abc123"] = {device, session, ...} │
│ → Include in _meta: {"trailblazeInvocationId": "inv-abc123", ...} │
│ │
│ 2. BLOCKING: Wait for tool call to complete │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ External tool executes, makes RPC calls back to Trailblaze │ │
│ │ │ │
│ │ tb.tap(...) → RPC includes invocationId │ │
│ │ tb.captureScreen() → RPC includes invocationId │ │
│ │ tb.inputText(...) → RPC includes invocationId │ │
│ │ │ │
│ │ Trailblaze receives RPC: │ │
│ │ → Extract invocationId from request │ │
│ │ → Lookup context: invocations["inv-abc123"] │ │
│ │ → Route to correct device, log against correct session │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ 3. Tool call completes (success or failure) │
│ → Remove context: invocations.remove("inv-abc123") │
│ → Return result to caller │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Key points: - Blocking call: Trailblaze blocks while waiting for the external tool to complete - Context scoped to call: The invocation context exists only while the tool is executing - Automatic cleanup: Context is removed when the tool call returns (success or failure) - RPC routing: All incoming RPC requests during execution use the invocation ID to find the right context
Error Handling¶
If an RPC request includes an invalid or unknown invocation ID, Trailblaze returns a standard tool call failure:
{
"error": {
"code": -32602,
"message": "Unknown invocation ID: inv-xyz. The tool call may have completed or timed out."
}
}
This propagates back to the external tool as a failed RPC call, which should cause the tool to return a failure to Trailblaze.
Tool Registry¶
Central Registry Per Server¶
Instead of annotating each tool with metadata, each server has a central registry:
// MyAppToolRegistry.kt - Single source of truth
object MyAppToolRegistry {
val tools: Map<String, ToolMetadata> = mapOf(
"myapp_launchAppSignedIn" to ToolMetadata(
platforms = setOf(ANDROID, IOS, DESKTOP),
groups = setOf("auth", "setup"),
),
"myapp_checkout" to ToolMetadata(
platforms = setOf(ANDROID, IOS, DESKTOP, WEB),
groups = setOf("checkout", "payments"),
),
"tapOnElementByNodeId" to ToolMetadata(
platforms = setOf(ANDROID, IOS, DESKTOP, WEB),
groups = setOf("core"),
isRecordable = false,
isDelegating = true,
),
)
val groups: Map<String, GroupInfo> = mapOf(
"auth" to GroupInfo("Authentication tools", defaultEnabled = true),
"checkout" to GroupInfo("Checkout flow", defaultEnabled = false),
)
}
Registry Data Model (Proto-Generated)¶
We provide proto-generated data models for the registry:
// trailblaze-registry.proto
message TrailblazeToolRegistry {
map<string, ToolMetadata> tools = 1;
map<string, GroupInfo> groups = 2;
ServerInfo server_info = 3;
}
message ToolMetadata {
repeated string platforms = 1;
repeated string groups = 2;
bool exposed_to_llm = 3;
bool is_recordable = 4;
bool is_delegating = 5;
}
message GroupInfo {
string description = 1;
bool default_enabled = 2;
}
message ServerInfo {
string name = 1;
string version = 2;
}
Teams use the generated data models in their language. No base class to maintain.
Exposed via MCP Resource¶
The registry is exposed as an MCP resource (standard MCP feature):
Resource URI: trailblaze://registry
{
"tools": {
"myapp_launchAppSignedIn": {
"platforms": ["android", "ios", "desktop"],
"groups": ["auth", "setup"],
"exposedToLlm": true,
"isRecordable": true,
"isDelegating": false
},
"tapOnElementByNodeId": {
"platforms": ["android", "ios", "desktop", "web"],
"groups": ["core"],
"exposedToLlm": true,
"isRecordable": false,
"isDelegating": true
}
},
"groups": {
"auth": { "description": "Authentication tools", "defaultEnabled": true },
"checkout": { "description": "Checkout flow", "defaultEnabled": false }
},
"serverInfo": {
"name": "trailblaze-myapp-tools",
"version": "1.2.0"
}
}
Separation of Concerns¶
| MCP Feature | Contains |
|---|---|
tools/list |
Tool names, descriptions, input schemas (standard MCP) |
resources/read(registry) |
Metadata: platforms, groups, flags (our extension) |
The registry references tools by name; it doesn’t duplicate tool definitions.
Using the Registry¶
The Trailblaze central agent reads registries from all connected MCP servers:
class TrailblazeToolRouter {
private val registries = mutableMapOf<String, TrailblazeToolRegistry>()
suspend fun loadFromMcpServer(serverName: String, client: McpClient) {
val resource = client.readResource("trailblaze://registry")
registries[serverName] = Json.decodeFromString(resource.content)
}
fun filterTools(platform: TrailblazeDevicePlatform, groups: Set<String>): List<ToolInfo> {
return registries.values.flatMap { registry ->
registry.tools.filter { (_, meta) ->
meta.exposedToLlm &&
meta.platforms.contains(platform) &&
(groups.isEmpty() || meta.groups.intersect(groups).isNotEmpty())
}
}
}
}
Tool Types¶
ExecutableTrailblazeTool¶
Tools that execute directly:
class InputTextTrailblazeTool(val text: String) : ExecutableTrailblazeTool {
override suspend fun execute(ctx: TrailblazeToolExecutionContext): TrailblazeToolResult {
ctx.trailblazeAgent.runMaestroCommands(listOf(InputTextCommand(text)))
return TrailblazeToolResult.Success
}
}
DelegatingTrailblazeTool and Recording¶
Tools that are exposed to the LLM but delegate execution to other (recordable) tools.
Registry Flags Explained¶
| Flag | Meaning |
|---|---|
exposedToLlm |
Tool appears in the tool list for LLM to call |
isRecordable |
Tool call is captured in trail recording |
isDelegating |
Tool converts to other tools before execution |
Why Tools Are Non-Recordable¶
There are two distinct reasons a tool might be isRecordable=False:
| Reason | Description | Replay Behavior |
|---|---|---|
| Delegating | Tool transforms to stable, recordable tools | Delegates are recorded and replayed |
| LLM-Dependent | Tool requires LLM reasoning based on current state | LLM must re-evaluate each replay |
Delegating example: tapOnElementByNodeId
- nodeId=42 is ephemeral (changes between screens)
- Delegates to tapOnElementWithText(text="Login") which is stable
- Recording captures the stable delegate
LLM-Dependent example: Visual validation - “Validate that the button is green” - Requires LLM to interpret screenshot and reason about color - Cannot be replayed deterministically — LLM must run each time
Common Combinations¶
| Pattern | exposedToLlm |
isRecordable |
isDelegating |
Example |
|---|---|---|---|---|
| Standard tool | ✅ | ✅ | ❌ | tapOnElementWithText |
| Delegating tool | ✅ | ❌ | ✅ | tapOnElementByNodeId |
| LLM-dependent tool | ✅ | ❌ | ❌ | Visual validation, semantic checks |
| Internal helper | ❌ | ❌ | ❌ | Internal utility functions |
Replay Modes¶
When replaying a recorded trail:
| Tool Type | Replay Behavior |
|---|---|
isRecordable=True |
Execute directly, no LLM needed |
isDelegating=True |
(Not in recording — delegates were recorded instead) |
isRecordable=False, isDelegating=False |
LLM must run to evaluate this step |
This means trails with LLM-dependent tools require “LLM-assisted replay” rather than pure deterministic replay.
The Delegating Pattern¶
// LLM calls this with a nodeId (ephemeral, screen-specific)
@TrailblazeToolClass(name = "tapOnElementByNodeId", isRecordable = false)
class TapOnElementByNodeIdTrailblazeTool(
val nodeId: Long,
val reason: String,
) : DelegatingTrailblazeTool {
override fun toExecutableTrailblazeTools(ctx: TrailblazeToolExecutionContext): List<ExecutableTrailblazeTool> {
// Convert nodeId to stable selector
val element = findElementByNode(nodeId, ctx.screenState)
// Delegate to a RECORDABLE tool with stable properties
return listOf(TapOnElementWithTextTrailblazeTool(text = element.text, id = element.id))
}
}
Recording Flow¶
┌─────────────────────────────────────────────────────────────────────┐
│ LLM decides: "I need to tap the Login button (nodeId=42)" │
│ │
│ Calls: tapOnElementByNodeId(nodeId=42, reason="Login button") │
│ │ │
│ │ isRecordable=false, isDelegating=true │
│ │ → NOT recorded │
│ ▼ │
│ Delegates to: TapOnElementWithTextTrailblazeTool(text="Login") │
│ │ │
│ │ isRecordable=true │
│ │ → RECORDED in trail │
│ ▼ │
│ Trail file captures: │
│ - tapOnElementWithText: │
│ text: "Login" │
└─────────────────────────────────────────────────────────────────────┘
Why this pattern?
- nodeId is ephemeral — changes between screen captures, can’t be replayed
- The delegated tool uses stable properties (text, ID) that work across runs
- Recording captures the replayable tool, not the ephemeral nodeId-based call
For External MCP Tools¶
External tools can also use this pattern via the registry:
@server.resource("trailblaze://registry")
async def get_registry():
return {
"tools": {
# Standard recordable tool
"acme_driver_login": {
"exposedToLlm": True,
"isRecordable": True,
"isDelegating": False,
},
# Delegating tool (converts to recordable primitives)
"acme_tap_by_screen_coords": {
"exposedToLlm": True,
"isRecordable": False, # Don't record coords-based tap
"isDelegating": True, # Converts to stable tap
},
}
}
Core Recording Principle¶
Recording = what Trailblaze invoked. Replay = Trailblaze invokes those same tools.
This keeps Trailblaze as the controller for both recording and replay. Delegating tools (including external MCP tools) return a list of tools for Trailblaze to execute — they don’t execute actions directly.
How Delegation Works¶
Kotlin (internal):
class TapOnElementByNodeIdTrailblazeTool : DelegatingTrailblazeTool {
override fun toExecutableTrailblazeTools(ctx): List<ExecutableTrailblazeTool> {
// Return what Trailblaze should execute
return listOf(TapOnElementWithTextTrailblazeTool(text = "Login"))
}
}
External MCP:
@server.tool("acme_tap_by_coords")
async def tap_by_coords(x: int, y: int, _trailblaze: dict) -> dict:
tb = TrailblazeClient(context=_trailblaze)
# Read-only queries are allowed (not recorded)
screen = await tb.capture_screen()
element = find_element_at(screen, x, y)
# Return delegate list - TRAILBLAZE will execute and record these
return {
"success": True,
"_trailblaze_delegates": [
{"tool": "tap", "args": {"text": element.text}}
]
}
The Delegation Flow¶
┌─────────────────────────────────────────────────────────────────────┐
│ 1. Trailblaze → MCP: acme_tap_by_coords(x=100, y=200) │
│ (isRecordable=False, isDelegating=True → NOT recorded) │
│ │
│ 2. External tool computes, uses read-only queries │
│ screen = await tb.capture_screen() ← read-only, not recorded │
│ │
│ 3. External tool returns: │
│ {"_trailblaze_delegates": [{"tool": "tap", "args": {...}}]} │
│ │
│ 4. Trailblaze receives response, sees delegates │
│ │
│ 5. Trailblaze executes: tap(text="Login") ← RECORDED │
│ (Trailblaze is the invoker) │
│ │
│ Recording: tap(text="Login") │
│ Replay: Trailblaze executes tap(text="Login") │
└─────────────────────────────────────────────────────────────────────┘
Read-Only vs Action Operations¶
External delegating tools can use read-only queries to compute what to delegate:
@server.tool("acme_smart_tap")
async def smart_tap(description: str, _trailblaze: dict) -> dict:
tb = TrailblazeClient(context=_trailblaze)
# ═══════════════════════════════════════════════════════════════
# READ-ONLY QUERIES (allowed, not recorded)
# ═══════════════════════════════════════════════════════════════
screen = await tb.capture_screen()
visible = await tb.is_visible(text="Login")
count = await tb.get_element_count(id="list_item")
# ═══════════════════════════════════════════════════════════════
# COMPUTE WHAT TO DELEGATE
# ═══════════════════════════════════════════════════════════════
if visible:
delegates = [{"tool": "tap", "args": {"text": "Login"}}]
else:
delegates = [
{"tool": "scroll", "args": {"direction": "down"}},
{"tool": "tap", "args": {"text": "Login"}},
]
# ═══════════════════════════════════════════════════════════════
# RETURN DELEGATES - Trailblaze executes and records these
# ═══════════════════════════════════════════════════════════════
return {
"success": True,
"_trailblaze_delegates": delegates
}
Nested Delegation¶
If a delegate is also a delegating tool, Trailblaze recursively processes until it reaches recordable tools:
acme_complex_flow (isDelegating=True, isRecordable=False)
→ returns delegates: [acme_login, acme_checkout]
acme_login (isDelegating=True, isRecordable=False)
→ returns delegates: [tap("Sign In"), inputText(...)]
tap("Sign In") (isRecordable=True) ← RECORDED
inputText(...) (isRecordable=True) ← RECORDED
acme_checkout (isDelegating=True, isRecordable=False)
→ returns delegates: [tap("Pay")]
tap("Pay") (isRecordable=True) ← RECORDED
Final recording: [tap("Sign In"), inputText(...), tap("Pay")]
Why This Design¶
| Aspect | Benefit |
|---|---|
| Trailblaze is always the invoker | Recording and replay use the same execution path |
| Clear control flow | No hidden action execution inside external tools |
| Deterministic replay | Recorded tools are exactly what Trailblaze will invoke |
| Matches Kotlin pattern | toExecutableTrailblazeTools() returns delegates |
Response Fields¶
| Return Field | Behavior |
|---|---|
_trailblaze_delegates |
List of tools for Trailblaze to execute (and record) |
| (no delegates) | Tool is not delegating, records itself if isRecordable=True |
Tool Composition¶
Tools can call other tools:
class MyAppFullCheckoutFlow(
private val commands: TrailblazeCommands,
private val loginTool: MyAppLoginTool, // Same server, direct call
) : TrailblazeTool {
override suspend fun execute(args: Args): ToolResult {
// Direct call to another tool (same process)
loginTool.execute(LoginArgs(args.email, args.password))
// Call core primitives (via RPC or direct, depending on mode)
commands.tap(text = "Shop")
commands.tap(text = args.itemName)
commands.tap(text = "Checkout")
return ToolResult.success()
}
}
Dynamic Tool Reload¶
For host mode, agents may create new tools at runtime. Explicit reload is required:
@Tool
fun reloadTools(): ReloadResult {
// Scan tool directories
// Re-read registries from MCP servers
// Update tool index
return ReloadResult(
added = listOf("new_tool_1"),
removed = emptyList(),
total = 47,
)
}
Teams must restart their MCP server (or implement hot-reload) for new tools to be available.
Cross-Platform Tools¶
Tools declare supported platforms in the registry:
"acme_login" to ToolMetadata(
platforms = setOf(ANDROID, IOS), // Not web
groups = setOf("auth"),
)
Tools can also check platform at runtime:
override suspend fun execute(args: Args): ToolResult {
val packageId = when (commands.platform) {
ANDROID -> "com.acme.driver"
IOS -> "com.acme.AcmeDriver"
else -> error("Unsupported platform")
}
commands.launchApp(packageId)
// ...
}
Type Safety and Refactoring¶
Wire/Proto as Contract¶
Kotlin (Wire) definitions ← SOURCE OF TRUTH
│
├── generates → Proto schema
├── generates → Kotlin interface
├── generates → Python client (typed)
├── generates → TypeScript client (typed)
└── generates → gRPC stubs
Refactoring Support¶
| Scenario | What Happens |
|---|---|
| Rename command in Kotlin | Wire regenerates proto → regenerate clients → errors everywhere |
| Add parameter | Same flow, clients get new param |
| Remove command | Same flow, compile/lint errors |
| Breaking change | Bump version in proto package |
What We Maintain vs Don’t¶
| We Maintain | We Don’t Maintain |
|---|---|
| Wire/Kotlin API definitions | External teams’ MCP servers |
| Proto generation pipeline | External teams’ tool logic |
| Generated clients (published packages) | External teams’ deployment |
| RPC server | External teams’ CI/CD |
trailblaze-android-ondevice-mcp module |
|
| Registry data models (proto-generated) | |
| Light SDK wrappers (Python, TypeScript) | |
TrailblazeToolSet Kotlin library |
|
| Invocation context propagation infrastructure |
Consequences¶
Positive:
- Type-safe API via Wire/proto generation
- Refactoring support across all languages
- External teams use any language (MCP + RPC)
- On-device works via library dependency
- Same interface across all deployment modes
- Central registry simplifies tool metadata management
- MCP resources for registry is standard protocol usage
- We don’t maintain external teams’ code
Negative:
- Wire/proto adds indirection (but provides type safety)
- External teams must run their own MCP server
- On-device requires building custom test APK
- Generated clients must be published and versioned
TrailblazeToolSetmust be built before full internal validation (but existing tools work unchanged)
Internal Validation¶
Your team can use the same architecture internally:
- One MCP server per app (MyApp, OtherApp, AdminPanel)
- Same registry pattern for tool metadata
- Same interface (
TrailblazeCommands) as external teams - Same SDK patterns — use
TrailblazeClient.from_context()even in Kotlin - You experience friction before external teams do
Validating the SDK Pattern¶
Internal tools should use the same patterns you recommend to external teams:
| Execution Mode | Pattern | Multi-Device? |
|---|---|---|
| In-process Kotlin | TrailblazeToolSet with ExecutionMode.IN_PROCESS |
✅ Automatic |
| Out-of-process testing | TrailblazeToolSet with ExecutionMode.RPC |
✅ Via invocation ID |
| Current (legacy) | TrailblazeToolExecutionContext |
✅ Automatic |
Going forward, your team can use TrailblazeToolSet — the same thin library published for external teams. This ensures:
- We experience the same developer ergonomics as external teams
- The same tool code works both in-process (production) and out-of-process (testing)
- We catch friction before external teams encounter it
Existing @TrailblazeToolClass tools continue to work unchanged.
Same Code, Multiple Execution Modes¶
Kotlin tools can be written to work both in-process and via stdio MCP:
┌─────────────────────────────────────────────────────────────────────┐
│ MyApp Tools Module (Kotlin) │
│ │
│ - Uses Wire/proto interface (TrailblazeCommands) │
│ - Uses Kotlin MCP SDK for tool definitions │
│ - Same code for both execution modes │
└─────────────────────────────────────────────────────────────────────┘
│ │
│ Production │ Development/Testing
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ In-Process │ │ Isolated Module │
│ │ │ │
│ Direct calls, │ │ ./gradlew :myapp │
│ no RPC overhead │ │ -tools:runMcp │
│ │ │ │
│ │ │ stdio transport │
│ │ │ Verifies MCP path │
└───────────────────┘ └───────────────────┘
Why this matters: - Ensures the stdio/MCP path works (we test it ourselves) - Same tool code can be extracted into a separate process if needed - Validates the external team experience internally
New Tool Definition: Kotlin MCP SDK¶
We will migrate from the current custom annotation system to using a standard Kotlin MCP SDK.
Current (deprecated):
// Custom Trailblaze annotations
@TrailblazeToolClass(name = "myapp_login", isRecordable = true)
class MyAppLoginTool(
val email: String,
val password: String,
) : ExecutableTrailblazeTool {
override suspend fun execute(ctx: TrailblazeToolExecutionContext): TrailblazeToolResult {
// ...
}
}
New (Kotlin MCP SDK with ToolSet pattern):
class MyAppToolSet(
private val commands: TrailblazeCommands,
) : ToolSet, HasRegistry by MyAppToolRegistry {
// ═══════════════════════════════════════════════════════════════
// TOOL NAMES (constants for type safety + refactoring)
// ═══════════════════════════════════════════════════════════════
object ToolNames {
const val LOGIN = "myapp_login"
const val CHECKOUT = "myapp_checkout"
const val SETUP = "myapp_setup"
}
// ═══════════════════════════════════════════════════════════════
// TOOLS (using constants for stable names)
// ═══════════════════════════════════════════════════════════════
@Tool(customName = ToolNames.LOGIN)
@LLMDescription("Log in to MyApp with credentials")
suspend fun login(
@LLMDescription("User email") email: String,
@LLMDescription("User password") password: String,
): ToolResult {
setup() // Direct call to another tool (type-safe)
commands.tap(text = "Sign in")
commands.inputText(email)
commands.inputText(password)
commands.tap(text = "Submit")
return ToolResult.success()
}
@Tool(customName = ToolNames.CHECKOUT)
@LLMDescription("Complete checkout with current cart")
suspend fun checkout(
@LLMDescription("Amount in cents") amount: Int,
): ToolResult {
commands.tap(text = "Checkout")
commands.waitUntilVisible(text = "Payment Complete", timeoutMs = 10000)
return ToolResult.success()
}
@Tool(customName = ToolNames.SETUP)
@LLMDescription("Clear and launch MyApp")
suspend fun setup(): ToolResult {
commands.clearAppData("com.example.myapp")
commands.launchApp("com.example.myapp")
return ToolResult.success()
}
}
// Registry as separate object (delegated to ToolSet)
object MyAppToolRegistry : HasRegistry {
override val registry = mapOf(
MyAppToolSet.ToolNames.LOGIN to ToolMetadata(
platforms = setOf(ANDROID, IOS),
groups = setOf("auth", "setup"),
exposedToLlm = true,
isRecordable = true,
),
MyAppToolSet.ToolNames.CHECKOUT to ToolMetadata(
platforms = setOf(ANDROID, IOS, WEB),
groups = setOf("checkout"),
exposedToLlm = true,
isRecordable = true,
),
MyAppToolSet.ToolNames.SETUP to ToolMetadata(
platforms = setOf(ANDROID, IOS),
groups = setOf("setup"),
exposedToLlm = false, // Internal helper
isRecordable = false,
),
)
}
// Interface for registry aggregation
interface HasRegistry {
val registry: Map<String, ToolMetadata>
}
Key patterns:
- Tool name constants — ToolNames.LOGIN enables refactoring and cross-references
- Explicit tool names — @Tool(customName = ...) ensures stable names even if function is renamed
- Colocated registry — metadata lives with the tools via delegation
- Direct tool composition — setup() calls another tool directly (type-safe, no MCP round-trip)
- HasRegistry interface — enables central aggregation of all registries
Benefits: - Standard mechanism — uses Kotlin MCP SDK, same as external teams - Portable — tools work in-process or via stdio without code changes - Type-safe — tool name constants prevent typos and enable refactoring - No custom annotation system — less code to maintain - MCP SDK handles — tool discovery, schema generation, invocation
Tool Authoring by Language¶
Kotlin (Recommended for In-Process)¶
See the ToolSet pattern above. Key points:
- Use Kotlin MCP SDK (@Tool, @LLMDescription, ToolSet)
- Tool name constants for type safety
- Colocated registry with HasRegistry interface
- Direct function calls for internal tool composition
Python / TypeScript (External Teams)¶
External teams use:
1. stdio transport — Trailblaze spawns and manages the server
2. Official MCP SDK — Python (FastMCP) or TypeScript (@modelcontextprotocol/sdk)
3. Trailblaze Client SDK — our lightweight wrapper with generated RPC stubs
4. MCP resource — expose trailblaze://registry with tool metadata
Both Python and TypeScript MCP SDKs provide built-in access to request context, which our SDK uses to extract invocation metadata for multi-device support.
Example (Python):
# acme_tools.py
from mcp import Server, Context
from trailblaze import TrailblazeClient # Light SDK wrapper
server = Server()
@server.tool("acme_driver_login")
async def login(email: str, password: str, ctx: Context) -> dict:
"""Log in to Acme Driver app."""
# Context-aware client - supports multi-device automatically
tb = TrailblazeClient.from_context(ctx)
await tb.clear_app_data("com.acme.driver")
await tb.launch_app("com.acme.driver")
await tb.tap(text="Sign in")
await tb.input_text(email)
await tb.input_text(password)
return {"success": True}
@server.resource("trailblaze://registry")
async def get_registry():
return {
"tools": {
"acme_driver_login": {
"platforms": ["android", "ios"],
"groups": ["auth"],
"exposedToLlm": True,
"isRecordable": True,
}
}
}
if __name__ == "__main__":
server.run()
Example (TypeScript):
// acme_tools.ts
import { Server, Context } from '@modelcontextprotocol/sdk';
import { TrailblazeClient } from '@trailblaze/client';
const server = new Server();
server.tool('acme_driver_login', async (args: { email: string, password: string }, ctx: Context) => {
// Context-aware client - supports multi-device automatically
const tb = TrailblazeClient.fromContext(ctx);
await tb.clearAppData('com.acme.driver');
await tb.launchApp('com.acme.driver');
await tb.tap({ text: 'Sign in' });
await tb.inputText(args.email);
await tb.inputText(args.password);
return { success: true };
});
server.resource('trailblaze://registry', async () => ({
tools: {
acme_driver_login: {
platforms: ['android', 'ios'],
groups: ['auth'],
exposedToLlm: true,
isRecordable: true,
}
}
}));
server.run();
Key pattern: Use TrailblazeClient.from_context(ctx) (Python) or TrailblazeClient.fromContext(ctx) (TypeScript) instead of a global client instance. This extracts the invocation ID from _meta and automatically includes it in all RPC calls, enabling multi-device support with zero additional effort.
Teams can build their own patterns (tool name constants, registries, etc.) on top of these primitives.
Trailblaze Client SDK¶
We provide official lightweight SDK wrappers for the two officially supported MCP SDK platforms: Python and TypeScript.
The SDK wrapper is a remote execution context. Just as in-process Kotlin tools receive TrailblazeToolExecutionContext, remote MCP tools receive TrailblazeClient — both provide the same capability: a device-scoped interface to execute Trailblaze commands.
These wrappers:
- Extract invocation context from MCP request metadata (
_meta) - Return a client that auto-includes context in all RPC calls (scoped to the correct device)
- Include generated RPC stubs for
TrailblazeCommands
Both Python (FastMCP) and TypeScript (@modelcontextprotocol/sdk) provide built-in access to request metadata in tool handlers, making context extraction trivial.
Python¶
Python’s FastMCP provides built-in Context injection. The Context object gives direct access to request metadata:
from trailblaze import TrailblazeClient
from mcp import Context
@mcp.tool()
def my_tool(param: str, ctx: Context) -> str:
# from_context extracts _meta.trailblaze from the request
tb = TrailblazeClient.from_context(ctx)
tb.tap(100, 200) # Invocation ID flows automatically
The Context parameter is automatically injected by FastMCP when present in the function signature.
TypeScript¶
TypeScript’s official MCP SDK (@modelcontextprotocol/sdk) also provides access to request context in tool handlers:
import { TrailblazeClient } from '@trailblaze/client';
server.tool('my_tool', async (args, ctx) => {
// fromContext extracts _meta.trailblaze from the request
const tb = TrailblazeClient.fromContext(ctx);
await tb.tap(100, 200); // Invocation ID flows automatically
});
Like Python, the context is automatically available in the tool handler callback.
Kotlin (Out-of-Process)¶
We do not provide an official SDK wrapper for Kotlin out-of-process tools.
Why Kotlin is different: The ToolSet pattern (used throughout this document) relies on Koog’s annotation-based tool registration. However, Koog’s @Tool functions only receive the deserialized parameters — not the raw CallToolRequest that contains _meta. This means tools defined via ToolSet cannot access invocation context.
For Kotlin MCP servers running out-of-process, use the raw MCP Kotlin SDK directly with CallToolRequest:
mcpServer.addTool("my_tool", ...) { request: CallToolRequest ->
// Extract metadata manually from request.meta
val invocationId = request.meta?.get("trailblazeInvocationId")
?.let { (it as? JsonPrimitive)?.content }
val baseUrl = request.meta?.get("trailblaze")
?.jsonObject?.get("baseUrl")?.jsonPrimitive?.content
?: "http://localhost:52525"
// Create client with extracted context
val tb = TrailblazeClient(baseUrl, invocationId)
tb.tap(100, 200)
}
Kotlin: TrailblazeToolSet (Required for Kotlin MCP SDK)¶
Why this is required: The Kotlin MCP SDK’s ToolSet pattern (used by Koog) hides the raw CallToolRequest from tool implementations. This means @Tool functions only receive deserialized parameters — they cannot access _meta to retrieve the invocation context needed for multi-device routing.
If your team wants to write Trailblaze tools using the Kotlin MCP SDK (as recommended in Phase 5.5), you must build TrailblazeToolSet first. Without it, Kotlin tools cannot:
- Access invocation context for multi-device support
- Route RPC calls to the correct device
- Follow the same pattern as Python/TypeScript tools
TrailblazeToolSet is a thin wrapper that:
- Uses the official Kotlin MCP SDK for tool definitions
- Intercepts CallToolRequest to extract _meta before invoking the tool
- Injects TrailblazeContext into tool functions (like FastMCP’s Context)
- Preserves TrailblazeTool data classes — no need to abandon our existing type-safe pattern
- Works both in-process and via RPC with a simple flag
- Can be published as part of our library for external Kotlin teams
// TrailblazeToolSet - supports both patterns
// OPTION 1: Keep using TrailblazeTool data classes (existing pattern)
// The data class is deserialized from args, context is injected separately
@Serializable
data class LoginTool(
val email: String,
val password: String,
) : TrailblazeTool
class AcmeToolSet : TrailblazeToolSet {
@Tool
@LLMDescription("Log in to Acme Driver app")
suspend fun login(
tool: LoginTool, // TrailblazeTool data class - type-safe!
ctx: TrailblazeContext, // Injected for _meta access
): ToolResult {
val tb = ctx.client
tb.clearAppData("com.acme.driver")
tb.launchApp("com.acme.driver")
tb.tap(text = "Sign in")
tb.inputText(tool.email) // Access via data class
tb.inputText(tool.password)
return ToolResult.success()
}
}
// OPTION 2: Flat parameters (simpler for small tools)
class SimpleToolSet : TrailblazeToolSet {
@Tool
@LLMDescription("Tap a button by text")
suspend fun tapButton(
text: String,
ctx: TrailblazeContext,
): ToolResult {
ctx.client.tap(text = text)
return ToolResult.success()
}
}
enum class ExecutionMode {
IN_PROCESS, // Direct TrailblazeCommands calls (no RPC overhead)
RPC, // Via generated RPC client (for out-of-process/remote)
}
Benefits of this approach: - Preserves TrailblazeTool — Keep existing type-safe data classes - Consistency — Same context injection pattern as Python/TypeScript SDKs - Internal validation — Use the same library you publish - Flexibility — Same tool code works in-process or out-of-process - Gradual migration — Existing tools continue to work, add context when needed
Existing tools: Current TrailblazeToolClass/ExecutableTrailblazeTool patterns remain supported. Tools can incrementally adopt TrailblazeContext injection for multi-device support without rewriting.
Kotlin (In-Process, Current)¶
Today, in-process tools have direct access to TrailblazeToolExecutionContext. This continues to work and is the current path for in-process Kotlin tools.
Migration path: Once TrailblazeToolSet is built, new tools should use it. Existing @TrailblazeToolClass tools continue to work unchanged, but new tools should follow the TrailblazeToolSet pattern for consistency with Python/TypeScript and to enable out-of-process testing.
SDK Surface Area¶
The official SDKs (Python and TypeScript) are intentionally minimal:
| Component | Size | Purpose |
|---|---|---|
TrailblazeClient.from_context() |
~10 lines | Factory that extracts invocation ID from _meta |
| Generated RPC stubs | (from proto) | tap(), swipe(), captureScreen(), etc. |
| Auto meta injection | ~5 lines | Includes invocation ID in all RPC calls |
Total: ~50-100 lines per language. The “SDK” is really just a convenience wrapper around the generated RPC client.
Officially Supported SDKs¶
| Language | SDK | Context Access | Status |
|---|---|---|---|
| Python | trailblaze package |
FastMCP Context injection |
✅ Planned |
| TypeScript | @trailblaze/client |
MCP SDK context in handler | ✅ Planned |
| Kotlin | TrailblazeToolSet |
TrailblazeContext injection |
✅ Planned |
| Other languages | DIY | Raw _meta extraction |
Follow pattern |
All three official SDKs (Python, TypeScript, Kotlin) will provide the same developer experience:
1. Context automatically injected into tool handlers
2. TrailblazeClient / TrailblazeCommands available via context
3. Invocation ID flows automatically through all RPC calls
4. Works for both single-device and multi-device scenarios
For languages without an official SDK, teams can follow the same pattern: extract trailblazeInvocationId and trailblaze object from _meta, then include them in RPC calls.
Query Commands for Conditionals¶
Custom tools often need to check screen state before deciding what to do. The API provides query commands that return values instead of throwing:
Actions vs Queries vs Assertions¶
| Category | Behavior | Example |
|---|---|---|
| Actions | Perform operation, return result | tap(), inputText() |
| Queries | Check state, return value (never throw) | isVisible(), hasText(), getElementCount() |
| Assertions | Check state, throw if condition not met | assertVisible(), waitUntilVisible() |
Using Queries for Conditional Logic¶
Python example:
@server.tool("acme_handle_onboarding")
async def handle_onboarding() -> dict:
# Check if cookie consent dialog is shown
if await tb.is_visible(text="Accept Cookies"):
await tb.tap(text="Accept")
# Check if we need to dismiss a tutorial
if await tb.is_visible(text="Skip Tutorial"):
await tb.tap(text="Skip")
# Check which screen we're on
if await tb.has_text("Welcome back"):
return {"screen": "returning_user"}
elif await tb.has_text("Create Account"):
return {"screen": "new_user"}
else:
return {"screen": "unknown"}
Kotlin example:
@Tool(customName = ToolNames.HANDLE_DIALOGS)
suspend fun handleDialogs(): ToolResult {
// Dismiss any blocking dialogs
if (commands.isVisible(text = "Allow notifications")) {
commands.tap(text = "Not now")
}
if (commands.isVisible(text = "Rate this app")) {
commands.tap(text = "Maybe later")
}
// Check how many items are in a list
val itemCount = commands.getElementCount(id = "list_item")
if (itemCount == 0) {
return ToolResult.failure("No items found")
}
return ToolResult.success()
}
Retry Patterns¶
@server.tool("acme_login_with_retry")
async def login_with_retry(email: str, password: str, max_attempts: int = 3) -> dict:
for attempt in range(max_attempts):
await tb.tap(text="Sign In")
await tb.input_text(email)
await tb.input_text(password)
await tb.tap(text="Submit")
# Check outcome without throwing
if await tb.is_visible(text="Dashboard", timeout_ms=5000):
return {"success": True, "attempts": attempt + 1}
if await tb.is_visible(text="Invalid credentials"):
# Wrong password, no point retrying
return {"success": False, "error": "invalid_credentials"}
# Network error or slow response, try again
return {"success": False, "error": "max_attempts_exceeded"}
Key Difference: Queries vs Assertions¶
# QUERY - returns False, doesn't throw
visible = await tb.is_visible(text="Login Button") # → False
# ASSERTION - throws/fails if not visible
await tb.assert_visible(text="Login Button") # → raises AssertionError
Use queries when you need to branch based on screen state. Use assertions when you’re verifying expected state (and want the test to fail if wrong).
Error Handling¶
Errors propagate through standard mechanisms at each layer:
| Layer | Error Handling |
|---|---|
| Tool execution | Return failure result with error message |
| MCP protocol | JSON-RPC error responses |
| RPC (gRPC/Connect) | Status codes + error details |
Tools should return meaningful error messages. Protocol-level errors (connection failures, timeouts) are handled by the respective transports. No custom error handling infrastructure required.
FAQ¶
Can external MCP servers make LLM requests through Trailblaze?¶
Future: Yes, via MCP Sampling.
The MCP protocol includes a Sampling feature that allows servers to request LLM completions from clients. This enables external tools to “borrow” Trailblaze’s configured LLM for tasks like:
- “Is the screen showing an error message?”
- “What color is the button?”
- “Does this screenshot match the expected state?”
Why MCP Sampling instead of an RPC endpoint?
| Approach | Who Can Use It | Control |
|---|---|---|
| MCP Sampling | Only servers Trailblaze connects to | Per-connection opt-in |
RPC askLlm() endpoint |
Anyone who connects to Trailblaze | Globally visible |
MCP Sampling is preferred because: 1. Trailblaze initiates the connection — you choose which servers can sample 2. Per-server opt-in — enable sampling only for trusted servers 3. Invocation context flows naturally — same pattern as other RPC calls 4. Protocol-compliant — standard MCP, not a custom API
Current status: Not implemented. The invocation context infrastructure (invocation ID, context propagation) provides the foundation. When implemented:
@mcp.tool()
async def verify_screen_color(expected_color: str, ctx: Context) -> dict:
tb = TrailblazeClient.from_context(ctx)
# Request LLM completion via MCP Sampling
result = await tb.sampling.create_message(
messages=[{"role": "user", "content": f"Is the button {expected_color}?"}],
include_screenshot=True,
)
return {"matches": "yes" in result.content.lower()}
Note: Most MCP clients (Cursor, Claude Desktop) don’t support sampling. This feature is for external MCP servers calling back to Trailblaze, not for clients calling Trailblaze.
Related Documents¶
- 010: Custom Tool Authoring - Previous decision this supersedes
- 008: Trailblaze MCP - MCP server architecture
- 032: Trail/Blaze Agent Architecture - How tools fit in agent architecture
References¶
- MCP Protocol - Tool discovery and invocation
- Wire - Kotlin-first proto library by Block
- Protocol Buffers - API definition
- gRPC - RPC framework
- Connect - Modern proto-based RPC