Trailblaze Decision 029: Custom Tool Architecture¶

Context¶

Decision 010: Custom Tool Authoring documented the current Kotlin-based approach and its limitations. Today, custom tools for Trailblaze (e.g., myapp_launchAppSignedIn, otherapp_scrollUntilTextIsVisible) must be:

Written in Kotlin with access to TrailblazeToolExecutionContext
Compiled into the Trailblaze distribution or test APK
Forked if you’re an external team wanting custom behavior

This creates a barrier for external adoption:

External teams (Acme, ExampleCorp, etc.) can’t extend Trailblaze without forking
Non-Kotlin teams (Python, TypeScript) have no path to custom tools
The current tool API is tightly coupled to internal implementation details

Requirements¶

Based on discussions with potential adopters and internal teams:

Teams MUST be able to add tools without forking — non-negotiable
Broad accessibility — not just Kotlin developers; Python, TypeScript, Go teams need a path
Type safety and refactoring support — we don’t want to maintain an untyped API surface
On-device support — device farms (Firebase Test Lab, AWS Device Farm) only allow app APK + test APK
Future extensibility — web (Playwright), desktop control are on the roadmap
Cross-platform tools — one tool should work across Android, iOS, Web where possible
Tool set management — ability to filter/group tools to reduce LLM context window

Decision¶

Implement a multi-path custom tool architecture with:

Wire/Proto-defined API — Kotlin-first with Wire, generates proto for external consumption
MCP for tool discovery — external tools run their own MCP servers
RPC for execution — host-driven tools call Trailblaze via generated clients
Library dependency for on-device — Kotlin tools compiled into test APK
Central tool registry — metadata exposed via MCP resources
Layered command interfaces — core + platform-specific backends

Architecture Overview¶

┌─────────────────────────────────────────────────────────────────────┐
│                         Outer Agent                                 │
│               (Claude, Goose, Cursor, Desktop App)                  │
└─────────────────────────────────────────────────────────────────────┘
        │               │               │               │
        │ MCP           │ MCP           │ MCP           │ MCP
        ▼               ▼               ▼               ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│  Trailblaze   │ │  MyApp Tools  │ │OtherApp Tools │ │  Acme Tools   │
│    (core)     │ │  (example)    │ │  (example)    │ │  (External)   │
│               │ │               │ │               │ │               │
│ Primitives:   │ │ App-specific: │ │ App-specific: │ │ App-specific: │
│ - tap         │ │ - login       │ │ - transfer    │ │ - login       │
│ - inputText   │ │ - checkout    │ │ - banking     │ │ - acceptRide  │
│ - launchApp   │ │ - cardReader  │ │               │ │               │
│               │ │               │ │               │ │               │
│ Registry: ✓   │ │ Registry: ✓   │ │ Registry: ✓   │ │ Registry: ✓   │
└───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘
        │               │               │               │
        └───────────────┴───────┬───────┴───────────────┘
                                │
                    TrailblazeCommands (RPC)
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Trailblaze RPC Server                            │
│                      (localhost:52525)                              │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  TrailblazeCommands (core interface)                        │   │
│  │  - tap(), inputText(), launchApp(), clearAppData()          │   │
│  │  - captureScreen(), waitUntilVisible(), assertVisible()     │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                              │                                      │
│        ┌─────────────────────┼─────────────────────┐               │
│        ▼                     ▼                     ▼               │
│  ┌───────────┐         ┌───────────┐         ┌───────────┐        │
│  │  Maestro  │         │ Playwright│         │  Desktop  │        │
│  │ Commands  │         │ Commands  │         │ Commands  │        │
│  │           │         │           │         │ (future)  │        │
│  │ - adb.*   │         │ - navigate│         │           │        │
│  │ - swipe   │         │ - fill    │         │           │        │
│  │ - scroll  │         │ - click   │         │           │        │
│  └───────────┘         └───────────┘         └───────────┘        │
│        │                     │                     │               │
└────────┼─────────────────────┼─────────────────────┼───────────────┘
         │                     │                     │
         ▼                     ▼                     ▼
    Mobile Device         Web Browser            Desktop App

Wire/Proto: Kotlin-First API Definition¶

We use Wire (a Kotlin-first proto library by Block) to define the API in Kotlin and generate proto for external consumption:

// Defined in Kotlin using Wire annotations
// Wire generates proto schema for interop with other languages

interface TrailblazeCommands {
    val platform: TrailblazeDevicePlatform

    // ═══════════════════════════════════════════════════════════════
    // ACTIONS (perform operations)
    // ═══════════════════════════════════════════════════════════════
    suspend fun tap(text: String? = null, id: String? = null, index: Int? = null): ActionResult
    suspend fun inputText(text: String): ActionResult
    suspend fun launchApp(packageId: String): ActionResult
    suspend fun clearAppData(packageId: String): ActionResult

    // ═══════════════════════════════════════════════════════════════
    // QUERIES (return values for conditionals - don't throw on "not found")
    // ═══════════════════════════════════════════════════════════════
    suspend fun isVisible(text: String? = null, id: String? = null, timeoutMs: Long? = null): Boolean
    suspend fun hasText(text: String): Boolean
    suspend fun getElementText(id: String): String?
    suspend fun getElementCount(text: String? = null, id: String? = null): Int
    suspend fun captureScreen(): ScreenState

    // ═══════════════════════════════════════════════════════════════
    // ASSERTIONS (throw/fail if condition not met)
    // ═══════════════════════════════════════════════════════════════
    suspend fun assertVisible(text: String? = null, id: String? = null, timeoutMs: Long? = null): ActionResult
    suspend fun waitUntilVisible(text: String, timeoutMs: Long): ActionResult

    // Platform-specific backends (nullable, check availability)
    val maestro: MaestroCommands?      // Mobile (Android/iOS)
    val playwright: PlaywrightCommands? // Web
    val desktop: DesktopCommands?       // Desktop (future)
}

// Mobile-specific (Maestro backend)
interface MaestroCommands {
    // Android
    suspend fun adbShell(command: String): ShellResult
    suspend fun grantPermission(packageId: String, permission: String): ActionResult
    suspend fun pressBack(): ActionResult
    suspend fun pressHome(): ActionResult

    // Gestures
    suspend fun swipe(direction: SwipeDirection, durationMs: Long? = null): ActionResult
    suspend fun scroll(direction: ScrollDirection, amount: Int? = null): ActionResult
    suspend fun pinch(scale: Float): ActionResult
}

// Web-specific (Playwright backend)
interface PlaywrightCommands {
    suspend fun navigate(url: String): ActionResult
    suspend fun click(selector: String): ActionResult
    suspend fun fill(selector: String, value: String): ActionResult
    suspend fun waitForSelector(selector: String, timeoutMs: Long? = null): ActionResult
    suspend fun evaluateJs(script: String): Any?
}

Code Generation¶

From the Wire/Kotlin definitions, we generate:

Generated Artifact	Language	Usage
`TrailblazeCommands` interface	Kotlin	In-process (host + on-device)
`trailblaze-api.proto`	Proto	External language interop
`TrailblazeClient`	Python	External MCP servers
`TrailblazeClient`	TypeScript	External MCP servers
gRPC/Connect stubs	Multiple	RPC communication

Deployment Paths¶

Path 1: Host-Driven (MCP + RPC)¶

Audience: Most external users (Python, TypeScript, Go teams)

External teams write their own MCP server that: 1. Exposes custom tools via MCP protocol 2. Uses generated RPC client to call Trailblaze commands 3. Runs as a separate process

# acme_tools_server.py
from mcp import Server
from trailblaze_client import TrailblazeClient  # Generated from proto

tb = TrailblazeClient("localhost:52525")
server = Server()

@server.tool("acme_driver_login")
async def login(email: str, password: str) -> dict:
    """Log in to Acme Driver app with credentials."""
    await tb.clear_app_data(package_id="com.acme.driver")
    await tb.launch_app(package_id="com.acme.driver")
    await tb.tap(text="Sign in")
    await tb.input_text(text=email)
    await tb.input_text(text=password)
    await tb.tap(text="Submit")
    await tb.wait_until_visible(text="Go Online", timeout_ms=10000)
    return {"success": True}

if __name__ == "__main__":
    server.run()

Path 2: On-Device (Library + Custom APK)¶

Audience: Teams running tests on device farms (Firebase Test Lab, AWS Device Farm)

Teams depend on trailblaze-android-ondevice-mcp and compile their tools into a test APK:

// AcmeDriverLoginTool.kt
class AcmeDriverLoginTool(
    private val commands: TrailblazeCommands  // Same interface as RPC!
) : TrailblazeTool {

    override suspend fun execute(args: Map<String, Any>): ToolResult {
        commands.clearAppData("com.acme.driver")
        commands.launchApp("com.acme.driver")
        commands.tap(text = "Sign in")
        commands.inputText(args["email"] as String)
        commands.inputText(args["password"] as String)
        commands.tap(text = "Submit")
        commands.waitUntilVisible(text = "Go Online", timeoutMs = 10000)
        return ToolResult.success()
    }
}

Key insight: The TrailblazeCommands interface is identical whether backed by RPC (host) or direct calls (on-device).

Path 3: Host-Driven Kotlin (In-Process)¶

Audience: Teams already using Kotlin who want in-process tools

Kotlin tools in the same JVM use the interface directly with no RPC overhead.

MCP Server Organization¶

Composable Tool Modules¶

Tools are organized as composable modules that can be combined into MCP servers flexibly:

Tool Modules (libraries):
├── shared-common-tools        # Shared across all your apps
├── myapp-tools              # MyApp-specific
├── otherapp-tools            # OtherApp-specific
├── admin-tools              # AdminPanel-specific
└── acme-tools                # External: Acme's tools

These modules are code libraries, not servers. How they’re composed into servers is a deployment decision.

Deployment Options¶

Option A: One server per app (independence)

MCP Servers:
├── myapp-server     → [shared-common-tools, myapp-tools]
├── otherapp-server       → [shared-common-tools, otherapp-tools]
└── admin-server  → [shared-common-tools, admin-tools]

Option B: One superset server (efficiency)

MCP Servers:
└── combined-tools-server → [shared-common-tools, myapp-tools, otherapp-tools, admin-tools]

Option C: Mix based on needs

MCP Servers:
├── mobile-server    → [shared-common-tools, myapp-tools, otherapp-tools]  # Mobile apps together
└── admin-server → [shared-common-tools, admin-tools]           # Web separate

Trade-offs¶

Approach	Ports	Processes	Deploy Independence	Failure Isolation
One per app	N	N	High	High
One superset	1	1	Low (coordinated)	Low

The overhead difference is minimal — a few extra processes. The bigger question is organizational: - Independence: Teams deploy their tools without coordinating - Efficiency: Single server, shared caches/state, one port to manage

Recommendation: Multiple Servers (with Build Considerations)¶

Default to one MCP server per app/team. The process overhead is minimal, and the organizational benefits are significant:

Ownership — Each team owns their server and tools
Independence — Deploy, update, and scale independently
Overlap handling — Teams can have similar tools without naming conflicts
Failure isolation — One server’s issues don’t affect others
Easier management — Clear boundaries for what tools live where

Startup time consideration: Multiple servers mean multiple process startups.

Server Type	Build Time	Startup Time	N Servers Impact
Python/TypeScript	None	1-3s (imports, init)	N × startup
Kotlin in-process	Part of Trailblaze	Zero (same JVM)	No impact
Kotlin out-of-process	Gradle compile	JVM startup + init	N × build + startup

For Python/TypeScript: No build, but starting 3 servers = 3× interpreter + import time. Usually acceptable (a few seconds total), but consider combining if startup latency matters.

For Kotlin MCP servers: Combine into one pre-built artifact with all tools, to avoid N builds at startup. The single JAR can still organize tools by namespace (myapp_*, otherapp_*). This gives: - Build once → start fast - Logical separation via namespacing - Single process or spawn multiple from same artifact

Tool Namespacing (Required)¶

Namespacing is critical regardless of deployment model. Even with one combined server, tools need clear namespaces to avoid conflicts and enable filtering.

Per ADR 005: Tool Naming Convention, we use underscores (not dots) because OpenAI function names don’t support dots:

myapp_login
myapp_checkout
otherapp_transfer
otherapp_requestMoney
admin_viewAnalytics

Benefits: - No conflicts — Multiple apps can have a login tool - Filtering — Agent can request only myapp_* tools for a MyApp test - Discovery — Clear ownership in tool listings - Composability — Same naming works whether tools are in one server or many

Recommendation: One Combined Server¶

For your organization’s internal tools, consider using one combined MCP server with all app tools:

combined-tools-server
├── myapp_*      (MyApp tools)
├── otherapp_*   (OtherApp tools)  
├── admin_*      (AdminPanel tools)
└── shared_*     (Shared utilities)

Why one server for your organization: - Single build, fast startup - Single process to manage - Shared caches/state when useful - Namespacing provides logical separation - Simpler registry — one resources/read("trailblaze://registry") returns all tool metadata

Registry with One Combined Server¶

One combined server simplifies the registry endpoint:

Deployment	Registry Calls	Aggregation
Multiple servers	N calls	Trailblaze aggregates
One combined server	1 call	None needed

The registry is a Map<String, ToolMetadata> where keys include the namespace:

// One registry, all tools, namespaced
val registry = mapOf(
    "myapp_login" to ToolMetadata(platforms = setOf(ANDROID, IOS), groups = setOf("auth")),
    "myapp_checkout" to ToolMetadata(platforms = setOf(ANDROID, IOS), groups = setOf("payment")),
    "otherapp_transfer" to ToolMetadata(platforms = setOf(ANDROID, IOS), groups = setOf("transfer")),
    "otherapp_requestMoney" to ToolMetadata(platforms = setOf(ANDROID, IOS), groups = setOf("transfer")),
)

Filtering by app is simple: registry.filter { it.key.startsWith("myapp_") }

External Teams: Flexible Deployment¶

External teams can deploy however they prefer: - Single-app server — One server for their app’s tools - Combined server — Multiple apps in one (following the combined server pattern) - Per-team servers — Organizational boundaries

The architecture supports all models — namespacing makes tools composable across any deployment.

Multiple MCP Servers is Standard¶

Connecting to multiple MCP servers is standard MCP usage:

Claude Desktop, Goose, Cursor all support multiple servers
Each server provides different capabilities
This is the intended design pattern

Whether your team uses 1 server or 5, the architecture supports it.

MCP Server Registration¶

Transport Model: stdio vs HTTP¶

We use a simple two-tier model for MCP transport:

Transport	Lifecycle	When to use
stdio	Trailblaze manages (start/stop)	Default for all managed servers
http	External (you manage)	Shared team servers, external infrastructure

stdio is the default because: - No port management — communication via stdin/stdout pipes, no port conflicts - Parallel-safe — multiple instances don’t conflict (unlike ports) - CI-friendly — parallel jobs just work, no port allocation needed - Simple config — just specify the command to run

Project Configuration (`trailblaze.yaml`)¶

Teams configure their MCP servers in their test repository:

# trailblaze.yaml - in repo root
target: acme_driver

mcpServers:
  # stdio (managed) - Trailblaze starts and stops this
  - name: acme-tools
    command: ./gradlew
    args: [:acme-tools:runMcp]

  # http (external) - already running, we just connect
  - name: shared-team-tools
    transport: http
    url: http://team-tools.internal:8080

The “magic” experience:

$ cd acme-trailblaze-tests
$ trailblaze run
# 1. Reads trailblaze.yaml
# 2. Spawns acme-tools via Gradle (stdio)
# 3. Connects to shared-team-tools (http, already running)
# 4. Loads registries, runs tests
# 5. Stops acme-tools on exit

Config Schema¶

# trailblaze.yaml
target: string              # Which app target (acme_driver, myapp, etc.)
platform: string               # Optional: android, ios, web

mcpServers:
  # stdio server (Trailblaze manages lifecycle)
  - name: string               # Identifier
    command: string            # Command to run
    args: [string]             # Arguments
    workingDir: string         # Optional: working directory
    env:                       # Optional: environment variables
      KEY: value

  # http server (externally managed)
  - name: string
    transport: http            # Explicitly http
    url: string                # URL to connect to

trailsDir: string              # Optional: where to find trails (default: ./trails)

Why stdio Avoids Port Problems¶

Multiple developers on same machine:

# Developer 1
trailblaze run  # Uses stdin/stdout pipe

# Developer 2 (same machine)
trailblaze run  # Uses different pipe, no conflict!

CI parallel jobs:

# All jobs run simultaneously, no port allocation needed
jobs:
  test-driver: { run: trailblaze run trails/driver/ }
  test-rider:  { run: trailblaze run trails/rider/ }
  test-eats:   { run: trailblaze run trails/eats/ }

STDIO Concurrency Limitation¶

Important consideration: STDIO MCP servers typically process requests sequentially — read a request, process it, respond, then read the next. A single STDIO connection cannot handle concurrent requests from multiple devices.

Recommended: HTTP Transport for Multi-Device¶

For multi-device support, Trailblaze spawns the MCP server in HTTP mode rather than STDIO:

┌───────────────────────────────────────────────────────────────────────┐
│  Trailblaze spawns HTTP MCP server (single process)                   │
│                                                                       │
│  $ python acme_tools_server.py --transport http --port 52530          │
│                                                                       │
│  Device 1 ──► POST /tools/call ──┐                                    │
│  Device 2 ──► POST /tools/call ──┼──► Concurrent handling            │
│  Device 3 ──► POST /tools/call ──┘                                    │
└───────────────────────────────────────────────────────────────────────┘

Why HTTP is better than STDIO-per-device:

Approach	Processes	Memory	State Sharing	Lifecycle
STDIO per-device	N processes	N × footprint	None (isolated)	Manage N
HTTP (single)	1 process	1 × footprint	Shared caches	Manage 1

Spawning one HTTP server with a controlled port is not more work than spawning N STDIO processes — and it’s cleaner:

Single process — less overhead than N STDIO processes
Shared state — tools can share caches, connections, loaded models
Trailblaze controls the port — no port conflicts
Same server code — MCP SDKs support both transports, tool author changes nothing

Invocation ID is required for HTTP transport. With concurrent requests from multiple devices hitting the same server, the invocation ID in _meta is how each request routes to the correct device:

@server.tool("analyze_screen")
async def analyze(prompt: str, ctx: Context) -> dict:
    # Multiple devices calling this concurrently!
    # Invocation ID tells us which device context to use
    tb = TrailblazeClient.from_context(ctx)

    screen = await tb.capture_screen()  # Routes to correct device
    return await tb.ask_llm(prompt, screen)

Single Device: STDIO Still Works¶

For the single device case (most common during development), STDIO remains simple and works without invocation ID:

mcp_servers:
  acme-tools:
    command: python
    args: [./acme_tools_server.py]
    # Single device: STDIO, no port needed

When multiple devices connect, Trailblaze automatically switches to HTTP mode, passing a port:

# Trailblaze internally does this when multi-device:
# python ./acme_tools_server.py --transport http --port 52530

For in-process Kotlin tools, concurrency is handled via coroutines in the same JVM — no transport concerns.

For External MCP Clients (Goose, Cursor, Claude Desktop)¶

Users configure their MCP client directly using each client’s config format. This is separate from trailblaze.yaml:

Goose (~/.config/goose/config.yaml):

mcp:
  servers:
    trailblaze:
      command: trailblaze
      args: [mcp]
    acme-tools:
      command: python
      args: [./acme_tools_server.py]

In-Process (Same JVM)¶

Your app-specific tools don’t need separate MCP servers. They’re in-process Kotlin:

// In-process: tools registered directly, no MCP server spawning
class TrailblazeMcpServer {
    val inProcessTools = listOf(
        MyAppLoginTool::class,
        MyAppCheckoutTool::class,
        OtherAppTransferTool::class,
        // All in-process, same JVM
    )
}

External MCP servers (defined in trailblaze.yaml) are for external teams running separate processes.

Multi-Device Execution¶

Trailblaze supports multiple devices connected simultaneously. A single Trailblaze instance can orchestrate tests across an iPhone, Android emulator, and physical Android device at the same time. This is a core capability that enables:

Cross-platform testing — Run the same test on iOS and Android in parallel
Device farms — Scale tests across dozens of devices
Comparative testing — Test the same flow on different device configurations

External tools don’t need to know about multi-device orchestration — device context flows through automatically.

How It Works¶

┌─────────────────────────────────────────────────────────────────────┐
│                    Trailblaze Desktop App                           │
│                                                                     │
│  Connected devices:                                                 │
│  - Device 1 (Pixel 6, emulator-5554)                               │
│  - Device 2 (iPhone 14, 00008101-...)                              │
│  - Device 3 (Galaxy S23, RF8M...)                                  │
│                                                                     │
│  User: "Run login test on all devices"                             │
│                                                                     │
│  Creates 3 parallel execution contexts:                            │
│  ┌───────────────┐ ┌───────────────┐ ┌───────────────┐             │
│  │ deviceId: d1  │ │ deviceId: d2  │ │ deviceId: d3  │             │
│  │ platform: AND │ │ platform: IOS │ │ platform: AND │             │
│  └───────────────┘ └───────────────┘ └───────────────┘             │
└─────────────────────────────────────────────────────────────────────┘
        │                   │                   │
        │ MCP call with     │ MCP call with     │ MCP call with
        │ device context    │ device context    │ device context
        ▼                   ▼                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    External MCP Server (Acme Tools)                 │
│                                                                     │
│  @server.tool("acme_driver_login")                                 │
│  async def login(email, password, _trailblaze):                    │
│      # Tool doesn't know about multiple devices                     │
│      # It just operates in the context it's given                   │
│      tb = TrailblazeClient(context=_trailblaze)                    │
│      await tb.tap(text="Sign in")   # Routed to correct device     │
│      await tb.input_text(email)     # Routed to correct device     │
│      ...                                                            │
└─────────────────────────────────────────────────────────────────────┘
        │                   │                   │
        │ RPC with d1       │ RPC with d2       │ RPC with d3
        ▼                   ▼                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Trailblaze RPC Server                            │
│                                                                     │
│  Routes each request to the correct device based on deviceId       │
│                                                                     │
│  d1 → Pixel 6          d2 → iPhone 14        d3 → Galaxy S23       │
└─────────────────────────────────────────────────────────────────────┘

Device Context Propagation¶

When Trailblaze calls an external MCP tool, it includes device context:

{
  "method": "tools/call",
  "params": {
    "name": "acme_driver_login",
    "arguments": { "email": "test@example.com", "password": "***" },
    "_trailblaze": {
      "deviceId": "emulator-5554",
      "sessionId": "abc123",
      "platform": "android"
    }
  }
}

The external tool uses a client that automatically includes this context in RPC calls:

@server.tool("acme_driver_login")
async def login(email: str, password: str, _trailblaze: dict) -> dict:
    # Client initialized with device context
    tb = TrailblazeClient(context=_trailblaze)

    # All RPC calls automatically include deviceId
    await tb.clear_app_data("com.acme.driver")  # → routed to emulator-5554
    await tb.launch_app("com.acme.driver")      # → routed to emulator-5554
    await tb.tap(text="Sign in")                   # → routed to emulator-5554

    return {"success": True}

Wire API Includes Device Context¶

The TrailblazeCommands interface includes device context in every request:

message TapRequest {
    string text = 1;           // Element text to match
    string id = 2;             // Optional: element ID
    int32 index = 3;           // Optional: index if multiple matches
    string device_id = 4;      // Routing
    string session_id = 5;     // Correlation
}

Tool Perspective¶

From an external tool’s perspective: - It receives a request with context - It does its work using the context - It returns a result

The tool doesn’t know or care that: - There are multiple devices - Tests are running in parallel - It’s being called multiple times simultaneously

The device context is transparent to the tool. Trailblaze handles parallelism and routing.

Invocation Context for Multi-Device¶

To support multiple devices simultaneously, every tool invocation needs execution context. When an external MCP tool makes RPC calls back to Trailblaze, those calls must route to the correct device — not just any connected device.

The invocation context is how we solve this. When Trailblaze calls an external MCP tool, it includes context in _meta: - Invocation ID — Correlates RPC callbacks to the originating tool call - Device info — Which device this tool invocation operates on - Session info — Logging and analytics correlation

For in-process Kotlin tools, this context is the TrailblazeToolExecutionContext (or TrailblazeContext in TrailblazeToolSet).

For remote MCP tools, this context flows via _meta and is wrapped by the SDK’s TrailblazeClient. The client is essentially a remote execution context — every RPC call it makes is scoped to the correct device.

Metadata Shape¶

{
  "_meta": {
    "trailblazeInvocationId": "inv-abc123",
    "trailblaze": {
      "baseUrl": "http://localhost:52525",
      "sessionId": "trail-xyz",
      "device": {
        "id": "emulator-5554",
        "platform": "ANDROID",
        "width": 1080,
        "height": 2400
      },
      "capabilities": {
        "sampling": true
      }
    }
  }
}

Field	Type	Purpose
`trailblazeInvocationId`	string	Correlates callbacks to originating request
`trailblaze.baseUrl`	string	Where to call back
`trailblaze.sessionId`	string	Trailblaze session for logging
`trailblaze.device.id`	string	Device identifier
`trailblaze.device.platform`	string	ANDROID / IOS
`trailblaze.device.width/height`	int	Screen dimensions
`trailblaze.capabilities.sampling`	bool	Whether LLM sampling is available

Single Device Fallback¶

For single-device scenarios, invocation ID is optional. Trailblaze falls back to the single active context when: - Only one device is connected - Only one tool invocation is active

Multi-device scenarios require explicit invocation ID propagation. If multiple devices are active and no invocation ID is provided, Trailblaze returns an error explaining the requirement.

Static vs Fresh Data¶

Static metadata (included in _meta to avoid round-trips): - Device info (platform, dimensions, ID) - Session ID - Callback URL - Capabilities

Fresh data (fetched via RPC on-demand): - View hierarchy — large, changes constantly - Screenshot — large (~100KB+), stale immediately - Current screen state — tool decides when it needs fresh data

This separation ensures tools have immediate access to static context while fetching dynamic data only when needed.

Invocation ID Lifecycle¶

The invocation ID ties together a single external tool call with all the RPC requests that tool makes back to Trailblaze.

┌─────────────────────────────────────────────────────────────────────────┐
│                           TRAILBLAZE                                    │
│                                                                         │
│  1. Trailblaze calls external MCP tool                                  │
│     → Generate invocationId = "inv-abc123"                              │
│     → Store context: invocations["inv-abc123"] = {device, session, ...} │
│     → Include in _meta: {"trailblazeInvocationId": "inv-abc123", ...}   │
│                                                                         │
│  2. BLOCKING: Wait for tool call to complete                            │
│     ┌─────────────────────────────────────────────────────────────────┐ │
│     │  External tool executes, makes RPC calls back to Trailblaze     │ │
│     │                                                                 │ │
│     │  tb.tap(...)        → RPC includes invocationId                 │ │
│     │  tb.captureScreen() → RPC includes invocationId                 │ │
│     │  tb.inputText(...)  → RPC includes invocationId                 │ │
│     │                                                                 │ │
│     │  Trailblaze receives RPC:                                       │ │
│     │  → Extract invocationId from request                            │ │
│     │  → Lookup context: invocations["inv-abc123"]                    │ │
│     │  → Route to correct device, log against correct session         │ │
│     └─────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  3. Tool call completes (success or failure)                            │
│     → Remove context: invocations.remove("inv-abc123")                  │
│     → Return result to caller                                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Key points: - Blocking call: Trailblaze blocks while waiting for the external tool to complete - Context scoped to call: The invocation context exists only while the tool is executing - Automatic cleanup: Context is removed when the tool call returns (success or failure) - RPC routing: All incoming RPC requests during execution use the invocation ID to find the right context

Error Handling¶

If an RPC request includes an invalid or unknown invocation ID, Trailblaze returns a standard tool call failure:

{
  "error": {
    "code": -32602,
    "message": "Unknown invocation ID: inv-xyz. The tool call may have completed or timed out."
  }
}

This propagates back to the external tool as a failed RPC call, which should cause the tool to return a failure to Trailblaze.

Tool Registry¶

Central Registry Per Server¶

Instead of annotating each tool with metadata, each server has a central registry:

// MyAppToolRegistry.kt - Single source of truth
object MyAppToolRegistry {
    val tools: Map<String, ToolMetadata> = mapOf(
        "myapp_launchAppSignedIn" to ToolMetadata(
            platforms = setOf(ANDROID, IOS, DESKTOP),
            groups = setOf("auth", "setup"),
        ),
        "myapp_checkout" to ToolMetadata(
            platforms = setOf(ANDROID, IOS, DESKTOP, WEB),
            groups = setOf("checkout", "payments"),
        ),
        "tapOnElementByNodeId" to ToolMetadata(
            platforms = setOf(ANDROID, IOS, DESKTOP, WEB),
            groups = setOf("core"),
            isRecordable = false,
            isDelegating = true,
        ),
    )

    val groups: Map<String, GroupInfo> = mapOf(
        "auth" to GroupInfo("Authentication tools", defaultEnabled = true),
        "checkout" to GroupInfo("Checkout flow", defaultEnabled = false),
    )
}

Registry Data Model (Proto-Generated)¶

We provide proto-generated data models for the registry:

// trailblaze-registry.proto
message TrailblazeToolRegistry {
    map<string, ToolMetadata> tools = 1;
    map<string, GroupInfo> groups = 2;
    ServerInfo server_info = 3;
}

message ToolMetadata {
    repeated string platforms = 1;
    repeated string groups = 2;
    bool exposed_to_llm = 3;
    bool is_recordable = 4;
    bool is_delegating = 5;
}

message GroupInfo {
    string description = 1;
    bool default_enabled = 2;
}

message ServerInfo {
    string name = 1;
    string version = 2;
}

Teams use the generated data models in their language. No base class to maintain.

Exposed via MCP Resource¶

The registry is exposed as an MCP resource (standard MCP feature):

Resource URI: trailblaze://registry

{
  "tools": {
    "myapp_launchAppSignedIn": {
      "platforms": ["android", "ios", "desktop"],
      "groups": ["auth", "setup"],
      "exposedToLlm": true,
      "isRecordable": true,
      "isDelegating": false
    },
    "tapOnElementByNodeId": {
      "platforms": ["android", "ios", "desktop", "web"],
      "groups": ["core"],
      "exposedToLlm": true,
      "isRecordable": false,
      "isDelegating": true
    }
  },
  "groups": {
    "auth": { "description": "Authentication tools", "defaultEnabled": true },
    "checkout": { "description": "Checkout flow", "defaultEnabled": false }
  },
  "serverInfo": {
    "name": "trailblaze-myapp-tools",
    "version": "1.2.0"
  }
}

Separation of Concerns¶

MCP Feature	Contains
`tools/list`	Tool names, descriptions, input schemas (standard MCP)
`resources/read(registry)`	Metadata: platforms, groups, flags (our extension)

The registry references tools by name; it doesn’t duplicate tool definitions.

Using the Registry¶

The Trailblaze central agent reads registries from all connected MCP servers:

class TrailblazeToolRouter {
    private val registries = mutableMapOf<String, TrailblazeToolRegistry>()

    suspend fun loadFromMcpServer(serverName: String, client: McpClient) {
        val resource = client.readResource("trailblaze://registry")
        registries[serverName] = Json.decodeFromString(resource.content)
    }

    fun filterTools(platform: TrailblazeDevicePlatform, groups: Set<String>): List<ToolInfo> {
        return registries.values.flatMap { registry ->
            registry.tools.filter { (_, meta) ->
                meta.exposedToLlm &&
                meta.platforms.contains(platform) &&
                (groups.isEmpty() || meta.groups.intersect(groups).isNotEmpty())
            }
        }
    }
}

Tool Types¶

ExecutableTrailblazeTool¶

Tools that execute directly:

class InputTextTrailblazeTool(val text: String) : ExecutableTrailblazeTool {
    override suspend fun execute(ctx: TrailblazeToolExecutionContext): TrailblazeToolResult {
        ctx.trailblazeAgent.runMaestroCommands(listOf(InputTextCommand(text)))
        return TrailblazeToolResult.Success
    }
}

DelegatingTrailblazeTool and Recording¶

Tools that are exposed to the LLM but delegate execution to other (recordable) tools.

Registry Flags Explained¶

Flag	Meaning
`exposedToLlm`	Tool appears in the tool list for LLM to call
`isRecordable`	Tool call is captured in trail recording
`isDelegating`	Tool converts to other tools before execution

Why Tools Are Non-Recordable¶

There are two distinct reasons a tool might be isRecordable=False:

Reason	Description	Replay Behavior
Delegating	Tool transforms to stable, recordable tools	Delegates are recorded and replayed
LLM-Dependent	Tool requires LLM reasoning based on current state	LLM must re-evaluate each replay

Delegating example: tapOnElementByNodeId - nodeId=42 is ephemeral (changes between screens) - Delegates to tapOnElementWithText(text="Login") which is stable - Recording captures the stable delegate

LLM-Dependent example: Visual validation - “Validate that the button is green” - Requires LLM to interpret screenshot and reason about color - Cannot be replayed deterministically — LLM must run each time

Common Combinations¶

Pattern	`exposedToLlm`	`isRecordable`	`isDelegating`	Example
Standard tool	✅	✅	❌	`tapOnElementWithText`
Delegating tool	✅	❌	✅	`tapOnElementByNodeId`
LLM-dependent tool	✅	❌	❌	Visual validation, semantic checks
Internal helper	❌	❌	❌	Internal utility functions

Replay Modes¶

When replaying a recorded trail:

Tool Type	Replay Behavior
`isRecordable=True`	Execute directly, no LLM needed
`isDelegating=True`	(Not in recording — delegates were recorded instead)
`isRecordable=False, isDelegating=False`	LLM must run to evaluate this step

This means trails with LLM-dependent tools require “LLM-assisted replay” rather than pure deterministic replay.

The Delegating Pattern¶

// LLM calls this with a nodeId (ephemeral, screen-specific)
@TrailblazeToolClass(name = "tapOnElementByNodeId", isRecordable = false)
class TapOnElementByNodeIdTrailblazeTool(
    val nodeId: Long,
    val reason: String,
) : DelegatingTrailblazeTool {

    override fun toExecutableTrailblazeTools(ctx: TrailblazeToolExecutionContext): List<ExecutableTrailblazeTool> {
        // Convert nodeId to stable selector
        val element = findElementByNode(nodeId, ctx.screenState)

        // Delegate to a RECORDABLE tool with stable properties
        return listOf(TapOnElementWithTextTrailblazeTool(text = element.text, id = element.id))
    }
}

Recording Flow¶

┌─────────────────────────────────────────────────────────────────────┐
│  LLM decides: "I need to tap the Login button (nodeId=42)"          │
│                                                                     │
│  Calls: tapOnElementByNodeId(nodeId=42, reason="Login button")      │
│                         │                                           │
│                         │ isRecordable=false, isDelegating=true     │
│                         │ → NOT recorded                            │
│                         ▼                                           │
│  Delegates to: TapOnElementWithTextTrailblazeTool(text="Login")     │
│                         │                                           │
│                         │ isRecordable=true                         │
│                         │ → RECORDED in trail                       │
│                         ▼                                           │
│  Trail file captures:                                               │
│  - tapOnElementWithText:                                            │
│      text: "Login"                                                  │
└─────────────────────────────────────────────────────────────────────┘

Why this pattern? - nodeId is ephemeral — changes between screen captures, can’t be replayed - The delegated tool uses stable properties (text, ID) that work across runs - Recording captures the replayable tool, not the ephemeral nodeId-based call

For External MCP Tools¶

External tools can also use this pattern via the registry:

@server.resource("trailblaze://registry")
async def get_registry():
    return {
        "tools": {
            # Standard recordable tool
            "acme_driver_login": {
                "exposedToLlm": True,
                "isRecordable": True,
                "isDelegating": False,
            },
            # Delegating tool (converts to recordable primitives)
            "acme_tap_by_screen_coords": {
                "exposedToLlm": True,
                "isRecordable": False,  # Don't record coords-based tap
                "isDelegating": True,   # Converts to stable tap
            },
        }
    }

Core Recording Principle¶

Recording = what Trailblaze invoked. Replay = Trailblaze invokes those same tools.

This keeps Trailblaze as the controller for both recording and replay. Delegating tools (including external MCP tools) return a list of tools for Trailblaze to execute — they don’t execute actions directly.

How Delegation Works¶

Kotlin (internal):

class TapOnElementByNodeIdTrailblazeTool : DelegatingTrailblazeTool {
    override fun toExecutableTrailblazeTools(ctx): List<ExecutableTrailblazeTool> {
        // Return what Trailblaze should execute
        return listOf(TapOnElementWithTextTrailblazeTool(text = "Login"))
    }
}

External MCP:

@server.tool("acme_tap_by_coords")
async def tap_by_coords(x: int, y: int, _trailblaze: dict) -> dict:
    tb = TrailblazeClient(context=_trailblaze)

    # Read-only queries are allowed (not recorded)
    screen = await tb.capture_screen()
    element = find_element_at(screen, x, y)

    # Return delegate list - TRAILBLAZE will execute and record these
    return {
        "success": True,
        "_trailblaze_delegates": [
            {"tool": "tap", "args": {"text": element.text}}
        ]
    }

The Delegation Flow¶

┌─────────────────────────────────────────────────────────────────────┐
│ 1. Trailblaze → MCP: acme_tap_by_coords(x=100, y=200)               │
│    (isRecordable=False, isDelegating=True → NOT recorded)           │
│                                                                     │
│ 2. External tool computes, uses read-only queries                   │
│    screen = await tb.capture_screen()  ← read-only, not recorded    │
│                                                                     │
│ 3. External tool returns:                                           │
│    {"_trailblaze_delegates": [{"tool": "tap", "args": {...}}]}     │
│                                                                     │
│ 4. Trailblaze receives response, sees delegates                     │
│                                                                     │
│ 5. Trailblaze executes: tap(text="Login")  ← RECORDED               │
│    (Trailblaze is the invoker)                                      │
│                                                                     │
│ Recording: tap(text="Login")                                        │
│ Replay: Trailblaze executes tap(text="Login")                       │
└─────────────────────────────────────────────────────────────────────┘

Read-Only vs Action Operations¶

External delegating tools can use read-only queries to compute what to delegate:

@server.tool("acme_smart_tap")
async def smart_tap(description: str, _trailblaze: dict) -> dict:
    tb = TrailblazeClient(context=_trailblaze)

    # ═══════════════════════════════════════════════════════════════
    # READ-ONLY QUERIES (allowed, not recorded)
    # ═══════════════════════════════════════════════════════════════
    screen = await tb.capture_screen()
    visible = await tb.is_visible(text="Login")
    count = await tb.get_element_count(id="list_item")

    # ═══════════════════════════════════════════════════════════════
    # COMPUTE WHAT TO DELEGATE
    # ═══════════════════════════════════════════════════════════════
    if visible:
        delegates = [{"tool": "tap", "args": {"text": "Login"}}]
    else:
        delegates = [
            {"tool": "scroll", "args": {"direction": "down"}},
            {"tool": "tap", "args": {"text": "Login"}},
        ]

    # ═══════════════════════════════════════════════════════════════
    # RETURN DELEGATES - Trailblaze executes and records these
    # ═══════════════════════════════════════════════════════════════
    return {
        "success": True,
        "_trailblaze_delegates": delegates
    }

Nested Delegation¶

If a delegate is also a delegating tool, Trailblaze recursively processes until it reaches recordable tools:

acme_complex_flow (isDelegating=True, isRecordable=False)
  → returns delegates: [acme_login, acme_checkout]

  acme_login (isDelegating=True, isRecordable=False)
    → returns delegates: [tap("Sign In"), inputText(...)]

    tap("Sign In") (isRecordable=True) ← RECORDED
    inputText(...) (isRecordable=True) ← RECORDED

  acme_checkout (isDelegating=True, isRecordable=False)
    → returns delegates: [tap("Pay")]

    tap("Pay") (isRecordable=True) ← RECORDED

Final recording: [tap("Sign In"), inputText(...), tap("Pay")]

Why This Design¶

Aspect	Benefit
Trailblaze is always the invoker	Recording and replay use the same execution path
Clear control flow	No hidden action execution inside external tools
Deterministic replay	Recorded tools are exactly what Trailblaze will invoke
Matches Kotlin pattern	`toExecutableTrailblazeTools()` returns delegates

Response Fields¶

Return Field	Behavior
`_trailblaze_delegates`	List of tools for Trailblaze to execute (and record)
(no delegates)	Tool is not delegating, records itself if `isRecordable=True`

Tool Composition¶

Tools can call other tools:

class MyAppFullCheckoutFlow(
    private val commands: TrailblazeCommands,
    private val loginTool: MyAppLoginTool,  // Same server, direct call
) : TrailblazeTool {

    override suspend fun execute(args: Args): ToolResult {
        // Direct call to another tool (same process)
        loginTool.execute(LoginArgs(args.email, args.password))

        // Call core primitives (via RPC or direct, depending on mode)
        commands.tap(text = "Shop")
        commands.tap(text = args.itemName)
        commands.tap(text = "Checkout")

        return ToolResult.success()
    }
}

Dynamic Tool Reload¶

For host mode, agents may create new tools at runtime. Explicit reload is required:

@Tool
fun reloadTools(): ReloadResult {
    // Scan tool directories
    // Re-read registries from MCP servers
    // Update tool index
    return ReloadResult(
        added = listOf("new_tool_1"),
        removed = emptyList(),
        total = 47,
    )
}

Teams must restart their MCP server (or implement hot-reload) for new tools to be available.

Cross-Platform Tools¶

Tools declare supported platforms in the registry:

"acme_login" to ToolMetadata(
    platforms = setOf(ANDROID, IOS),  // Not web
    groups = setOf("auth"),
)

Tools can also check platform at runtime:

override suspend fun execute(args: Args): ToolResult {
    val packageId = when (commands.platform) {
        ANDROID -> "com.acme.driver"
        IOS -> "com.acme.AcmeDriver"
        else -> error("Unsupported platform")
    }
    commands.launchApp(packageId)
    // ...
}

Type Safety and Refactoring¶

Wire/Proto as Contract¶

Kotlin (Wire) definitions  ←  SOURCE OF TRUTH
       │
       ├── generates → Proto schema
       ├── generates → Kotlin interface
       ├── generates → Python client (typed)
       ├── generates → TypeScript client (typed)
       └── generates → gRPC stubs

Refactoring Support¶

Scenario	What Happens
Rename command in Kotlin	Wire regenerates proto → regenerate clients → errors everywhere
Add parameter	Same flow, clients get new param
Remove command	Same flow, compile/lint errors
Breaking change	Bump version in proto package

What We Maintain vs Don’t¶

We Maintain	We Don’t Maintain
Wire/Kotlin API definitions	External teams’ MCP servers
Proto generation pipeline	External teams’ tool logic
Generated clients (published packages)	External teams’ deployment
RPC server	External teams’ CI/CD
`trailblaze-android-ondevice-mcp` module
Registry data models (proto-generated)
Light SDK wrappers (Python, TypeScript)
`TrailblazeToolSet` Kotlin library
Invocation context propagation infrastructure

Consequences¶

Positive:

Type-safe API via Wire/proto generation
Refactoring support across all languages
External teams use any language (MCP + RPC)
On-device works via library dependency
Same interface across all deployment modes
Central registry simplifies tool metadata management
MCP resources for registry is standard protocol usage
We don’t maintain external teams’ code

Negative:

Wire/proto adds indirection (but provides type safety)
External teams must run their own MCP server
On-device requires building custom test APK
Generated clients must be published and versioned
TrailblazeToolSet must be built before full internal validation (but existing tools work unchanged)

Internal Validation¶

Your team can use the same architecture internally:

One MCP server per app (MyApp, OtherApp, AdminPanel)
Same registry pattern for tool metadata
Same interface (TrailblazeCommands) as external teams
Same SDK patterns — use TrailblazeClient.from_context() even in Kotlin
You experience friction before external teams do

Validating the SDK Pattern¶

Internal tools should use the same patterns you recommend to external teams:

Execution Mode	Pattern	Multi-Device?
In-process Kotlin	`TrailblazeToolSet` with `ExecutionMode.IN_PROCESS`	✅ Automatic
Out-of-process testing	`TrailblazeToolSet` with `ExecutionMode.RPC`	✅ Via invocation ID
Current (legacy)	`TrailblazeToolExecutionContext`	✅ Automatic

Going forward, your team can use TrailblazeToolSet — the same thin library published for external teams. This ensures: - We experience the same developer ergonomics as external teams - The same tool code works both in-process (production) and out-of-process (testing) - We catch friction before external teams encounter it

Existing @TrailblazeToolClass tools continue to work unchanged.

Same Code, Multiple Execution Modes¶

Kotlin tools can be written to work both in-process and via stdio MCP:

┌─────────────────────────────────────────────────────────────────────┐
│                    MyApp Tools Module (Kotlin)                      │
│                                                                     │
│  - Uses Wire/proto interface (TrailblazeCommands)                  │
│  - Uses Kotlin MCP SDK for tool definitions                        │
│  - Same code for both execution modes                              │
└─────────────────────────────────────────────────────────────────────┘
        │                               │
        │ Production                    │ Development/Testing
        ▼                               ▼
┌───────────────────┐           ┌───────────────────┐
│   In-Process      │           │   Isolated Module │
│                   │           │                   │
│ Direct calls,     │           │ ./gradlew :myapp  │
│ no RPC overhead   │           │   -tools:runMcp   │
│                   │           │                   │
│                   │           │ stdio transport   │
│                   │           │ Verifies MCP path │
└───────────────────┘           └───────────────────┘

Why this matters: - Ensures the stdio/MCP path works (we test it ourselves) - Same tool code can be extracted into a separate process if needed - Validates the external team experience internally

New Tool Definition: Kotlin MCP SDK¶

We will migrate from the current custom annotation system to using a standard Kotlin MCP SDK.

Current (deprecated):

// Custom Trailblaze annotations
@TrailblazeToolClass(name = "myapp_login", isRecordable = true)
class MyAppLoginTool(
    val email: String,
    val password: String,
) : ExecutableTrailblazeTool {
    override suspend fun execute(ctx: TrailblazeToolExecutionContext): TrailblazeToolResult {
        // ...
    }
}

New (Kotlin MCP SDK with ToolSet pattern):

class MyAppToolSet(
    private val commands: TrailblazeCommands,
) : ToolSet, HasRegistry by MyAppToolRegistry {

    // ═══════════════════════════════════════════════════════════════
    // TOOL NAMES (constants for type safety + refactoring)
    // ═══════════════════════════════════════════════════════════════

    object ToolNames {
        const val LOGIN = "myapp_login"
        const val CHECKOUT = "myapp_checkout"
        const val SETUP = "myapp_setup"
    }

    // ═══════════════════════════════════════════════════════════════
    // TOOLS (using constants for stable names)
    // ═══════════════════════════════════════════════════════════════

    @Tool(customName = ToolNames.LOGIN)
    @LLMDescription("Log in to MyApp with credentials")
    suspend fun login(
        @LLMDescription("User email") email: String,
        @LLMDescription("User password") password: String,
    ): ToolResult {
        setup()  // Direct call to another tool (type-safe)
        commands.tap(text = "Sign in")
        commands.inputText(email)
        commands.inputText(password)
        commands.tap(text = "Submit")
        return ToolResult.success()
    }

    @Tool(customName = ToolNames.CHECKOUT)
    @LLMDescription("Complete checkout with current cart")
    suspend fun checkout(
        @LLMDescription("Amount in cents") amount: Int,
    ): ToolResult {
        commands.tap(text = "Checkout")
        commands.waitUntilVisible(text = "Payment Complete", timeoutMs = 10000)
        return ToolResult.success()
    }

    @Tool(customName = ToolNames.SETUP)
    @LLMDescription("Clear and launch MyApp")
    suspend fun setup(): ToolResult {
        commands.clearAppData("com.example.myapp")
        commands.launchApp("com.example.myapp")
        return ToolResult.success()
    }
}

// Registry as separate object (delegated to ToolSet)
object MyAppToolRegistry : HasRegistry {
    override val registry = mapOf(
        MyAppToolSet.ToolNames.LOGIN to ToolMetadata(
            platforms = setOf(ANDROID, IOS),
            groups = setOf("auth", "setup"),
            exposedToLlm = true,
            isRecordable = true,
        ),
        MyAppToolSet.ToolNames.CHECKOUT to ToolMetadata(
            platforms = setOf(ANDROID, IOS, WEB),
            groups = setOf("checkout"),
            exposedToLlm = true,
            isRecordable = true,
        ),
        MyAppToolSet.ToolNames.SETUP to ToolMetadata(
            platforms = setOf(ANDROID, IOS),
            groups = setOf("setup"),
            exposedToLlm = false,  // Internal helper
            isRecordable = false,
        ),
    )
}

// Interface for registry aggregation
interface HasRegistry {
    val registry: Map<String, ToolMetadata>
}

Key patterns: - Tool name constants — ToolNames.LOGIN enables refactoring and cross-references - Explicit tool names — @Tool(customName = ...) ensures stable names even if function is renamed - Colocated registry — metadata lives with the tools via delegation - Direct tool composition — setup() calls another tool directly (type-safe, no MCP round-trip) - HasRegistry interface — enables central aggregation of all registries

Benefits: - Standard mechanism — uses Kotlin MCP SDK, same as external teams - Portable — tools work in-process or via stdio without code changes - Type-safe — tool name constants prevent typos and enable refactoring - No custom annotation system — less code to maintain - MCP SDK handles — tool discovery, schema generation, invocation

Tool Authoring by Language¶

Kotlin (Recommended for In-Process)¶

See the ToolSet pattern above. Key points: - Use Kotlin MCP SDK (@Tool, @LLMDescription, ToolSet) - Tool name constants for type safety - Colocated registry with HasRegistry interface - Direct function calls for internal tool composition

Python / TypeScript (External Teams)¶

External teams use: 1. stdio transport — Trailblaze spawns and manages the server 2. Official MCP SDK — Python (FastMCP) or TypeScript (@modelcontextprotocol/sdk) 3. Trailblaze Client SDK — our lightweight wrapper with generated RPC stubs 4. MCP resource — expose trailblaze://registry with tool metadata

Both Python and TypeScript MCP SDKs provide built-in access to request context, which our SDK uses to extract invocation metadata for multi-device support.

Example (Python):

# acme_tools.py
from mcp import Server, Context
from trailblaze import TrailblazeClient  # Light SDK wrapper

server = Server()

@server.tool("acme_driver_login")
async def login(email: str, password: str, ctx: Context) -> dict:
    """Log in to Acme Driver app."""
    # Context-aware client - supports multi-device automatically
    tb = TrailblazeClient.from_context(ctx)

    await tb.clear_app_data("com.acme.driver")
    await tb.launch_app("com.acme.driver")
    await tb.tap(text="Sign in")
    await tb.input_text(email)
    await tb.input_text(password)
    return {"success": True}

@server.resource("trailblaze://registry")
async def get_registry():
    return {
        "tools": {
            "acme_driver_login": {
                "platforms": ["android", "ios"],
                "groups": ["auth"],
                "exposedToLlm": True,
                "isRecordable": True,
            }
        }
    }

if __name__ == "__main__":
    server.run()

Example (TypeScript):

// acme_tools.ts
import { Server, Context } from '@modelcontextprotocol/sdk';
import { TrailblazeClient } from '@trailblaze/client';

const server = new Server();

server.tool('acme_driver_login', async (args: { email: string, password: string }, ctx: Context) => {
    // Context-aware client - supports multi-device automatically
    const tb = TrailblazeClient.fromContext(ctx);

    await tb.clearAppData('com.acme.driver');
    await tb.launchApp('com.acme.driver');
    await tb.tap({ text: 'Sign in' });
    await tb.inputText(args.email);
    await tb.inputText(args.password);
    return { success: true };
});

server.resource('trailblaze://registry', async () => ({
    tools: {
        acme_driver_login: {
            platforms: ['android', 'ios'],
            groups: ['auth'],
            exposedToLlm: true,
            isRecordable: true,
        }
    }
}));

server.run();

Key pattern: Use TrailblazeClient.from_context(ctx) (Python) or TrailblazeClient.fromContext(ctx) (TypeScript) instead of a global client instance. This extracts the invocation ID from _meta and automatically includes it in all RPC calls, enabling multi-device support with zero additional effort.

Teams can build their own patterns (tool name constants, registries, etc.) on top of these primitives.

Trailblaze Client SDK¶

We provide official lightweight SDK wrappers for the two officially supported MCP SDK platforms: Python and TypeScript.

The SDK wrapper is a remote execution context. Just as in-process Kotlin tools receive TrailblazeToolExecutionContext, remote MCP tools receive TrailblazeClient — both provide the same capability: a device-scoped interface to execute Trailblaze commands.

These wrappers:

Extract invocation context from MCP request metadata (_meta)
Return a client that auto-includes context in all RPC calls (scoped to the correct device)
Include generated RPC stubs for TrailblazeCommands

Both Python (FastMCP) and TypeScript (@modelcontextprotocol/sdk) provide built-in access to request metadata in tool handlers, making context extraction trivial.

Python¶

Python’s FastMCP provides built-in Context injection. The Context object gives direct access to request metadata:

from trailblaze import TrailblazeClient
from mcp import Context

@mcp.tool()
def my_tool(param: str, ctx: Context) -> str:
    # from_context extracts _meta.trailblaze from the request
    tb = TrailblazeClient.from_context(ctx)
    tb.tap(100, 200)  # Invocation ID flows automatically

The Context parameter is automatically injected by FastMCP when present in the function signature.

TypeScript¶

TypeScript’s official MCP SDK (@modelcontextprotocol/sdk) also provides access to request context in tool handlers:

import { TrailblazeClient } from '@trailblaze/client';

server.tool('my_tool', async (args, ctx) => {
    // fromContext extracts _meta.trailblaze from the request
    const tb = TrailblazeClient.fromContext(ctx);
    await tb.tap(100, 200);  // Invocation ID flows automatically
});

Like Python, the context is automatically available in the tool handler callback.

Kotlin (Out-of-Process)¶

We do not provide an official SDK wrapper for Kotlin out-of-process tools.

Why Kotlin is different: The ToolSet pattern (used throughout this document) relies on Koog’s annotation-based tool registration. However, Koog’s @Tool functions only receive the deserialized parameters — not the raw CallToolRequest that contains _meta. This means tools defined via ToolSet cannot access invocation context.

For Kotlin MCP servers running out-of-process, use the raw MCP Kotlin SDK directly with CallToolRequest:

mcpServer.addTool("my_tool", ...) { request: CallToolRequest ->
    // Extract metadata manually from request.meta
    val invocationId = request.meta?.get("trailblazeInvocationId")
        ?.let { (it as? JsonPrimitive)?.content }
    val baseUrl = request.meta?.get("trailblaze")
        ?.jsonObject?.get("baseUrl")?.jsonPrimitive?.content
        ?: "http://localhost:52525"

    // Create client with extracted context
    val tb = TrailblazeClient(baseUrl, invocationId)
    tb.tap(100, 200)
}

Kotlin: TrailblazeToolSet (Required for Kotlin MCP SDK)¶

Why this is required: The Kotlin MCP SDK’s ToolSet pattern (used by Koog) hides the raw CallToolRequest from tool implementations. This means @Tool functions only receive deserialized parameters — they cannot access _meta to retrieve the invocation context needed for multi-device routing.

If your team wants to write Trailblaze tools using the Kotlin MCP SDK (as recommended in Phase 5.5), you must build TrailblazeToolSet first. Without it, Kotlin tools cannot: - Access invocation context for multi-device support - Route RPC calls to the correct device - Follow the same pattern as Python/TypeScript tools

TrailblazeToolSet is a thin wrapper that: - Uses the official Kotlin MCP SDK for tool definitions - Intercepts CallToolRequest to extract _meta before invoking the tool - Injects TrailblazeContext into tool functions (like FastMCP’s Context) - Preserves TrailblazeTool data classes — no need to abandon our existing type-safe pattern - Works both in-process and via RPC with a simple flag - Can be published as part of our library for external Kotlin teams

// TrailblazeToolSet - supports both patterns

// OPTION 1: Keep using TrailblazeTool data classes (existing pattern)
// The data class is deserialized from args, context is injected separately
@Serializable
data class LoginTool(
    val email: String,
    val password: String,
) : TrailblazeTool

class AcmeToolSet : TrailblazeToolSet {
    @Tool
    @LLMDescription("Log in to Acme Driver app")
    suspend fun login(
        tool: LoginTool,              // TrailblazeTool data class - type-safe!
        ctx: TrailblazeContext,       // Injected for _meta access
    ): ToolResult {
        val tb = ctx.client
        tb.clearAppData("com.acme.driver")
        tb.launchApp("com.acme.driver")
        tb.tap(text = "Sign in")
        tb.inputText(tool.email)      // Access via data class
        tb.inputText(tool.password)
        return ToolResult.success()
    }
}

// OPTION 2: Flat parameters (simpler for small tools)
class SimpleToolSet : TrailblazeToolSet {
    @Tool
    @LLMDescription("Tap a button by text")
    suspend fun tapButton(
        text: String,
        ctx: TrailblazeContext,
    ): ToolResult {
        ctx.client.tap(text = text)
        return ToolResult.success()
    }
}

enum class ExecutionMode {
    IN_PROCESS,  // Direct TrailblazeCommands calls (no RPC overhead)
    RPC,         // Via generated RPC client (for out-of-process/remote)
}

Benefits of this approach: - Preserves TrailblazeTool — Keep existing type-safe data classes - Consistency — Same context injection pattern as Python/TypeScript SDKs - Internal validation — Use the same library you publish - Flexibility — Same tool code works in-process or out-of-process - Gradual migration — Existing tools continue to work, add context when needed

Existing tools: Current TrailblazeToolClass/ExecutableTrailblazeTool patterns remain supported. Tools can incrementally adopt TrailblazeContext injection for multi-device support without rewriting.

Kotlin (In-Process, Current)¶

Today, in-process tools have direct access to TrailblazeToolExecutionContext. This continues to work and is the current path for in-process Kotlin tools.

Migration path: Once TrailblazeToolSet is built, new tools should use it. Existing @TrailblazeToolClass tools continue to work unchanged, but new tools should follow the TrailblazeToolSet pattern for consistency with Python/TypeScript and to enable out-of-process testing.

SDK Surface Area¶

The official SDKs (Python and TypeScript) are intentionally minimal:

Component	Size	Purpose
`TrailblazeClient.from_context()`	~10 lines	Factory that extracts invocation ID from `_meta`
Generated RPC stubs	(from proto)	`tap()`, `swipe()`, `captureScreen()`, etc.
Auto meta injection	~5 lines	Includes invocation ID in all RPC calls

Total: ~50-100 lines per language. The “SDK” is really just a convenience wrapper around the generated RPC client.

Officially Supported SDKs¶

Language	SDK	Context Access	Status
Python	`trailblaze` package	FastMCP `Context` injection	✅ Planned
TypeScript	`@trailblaze/client`	MCP SDK context in handler	✅ Planned
Kotlin	`TrailblazeToolSet`	`TrailblazeContext` injection	✅ Planned
Other languages	DIY	Raw `_meta` extraction	Follow pattern

All three official SDKs (Python, TypeScript, Kotlin) will provide the same developer experience: 1. Context automatically injected into tool handlers 2. TrailblazeClient / TrailblazeCommands available via context 3. Invocation ID flows automatically through all RPC calls 4. Works for both single-device and multi-device scenarios

For languages without an official SDK, teams can follow the same pattern: extract trailblazeInvocationId and trailblaze object from _meta, then include them in RPC calls.

Query Commands for Conditionals¶

Custom tools often need to check screen state before deciding what to do. The API provides query commands that return values instead of throwing:

Actions vs Queries vs Assertions¶

Category	Behavior	Example
Actions	Perform operation, return result	`tap()`, `inputText()`
Queries	Check state, return value (never throw)	`isVisible()`, `hasText()`, `getElementCount()`
Assertions	Check state, throw if condition not met	`assertVisible()`, `waitUntilVisible()`

Using Queries for Conditional Logic¶

Python example:

@server.tool("acme_handle_onboarding")
async def handle_onboarding() -> dict:
    # Check if cookie consent dialog is shown
    if await tb.is_visible(text="Accept Cookies"):
        await tb.tap(text="Accept")

    # Check if we need to dismiss a tutorial
    if await tb.is_visible(text="Skip Tutorial"):
        await tb.tap(text="Skip")

    # Check which screen we're on
    if await tb.has_text("Welcome back"):
        return {"screen": "returning_user"}
    elif await tb.has_text("Create Account"):
        return {"screen": "new_user"}
    else:
        return {"screen": "unknown"}

Kotlin example:

@Tool(customName = ToolNames.HANDLE_DIALOGS)
suspend fun handleDialogs(): ToolResult {
    // Dismiss any blocking dialogs
    if (commands.isVisible(text = "Allow notifications")) {
        commands.tap(text = "Not now")
    }

    if (commands.isVisible(text = "Rate this app")) {
        commands.tap(text = "Maybe later")
    }

    // Check how many items are in a list
    val itemCount = commands.getElementCount(id = "list_item")
    if (itemCount == 0) {
        return ToolResult.failure("No items found")
    }

    return ToolResult.success()
}

Retry Patterns¶

@server.tool("acme_login_with_retry")
async def login_with_retry(email: str, password: str, max_attempts: int = 3) -> dict:
    for attempt in range(max_attempts):
        await tb.tap(text="Sign In")
        await tb.input_text(email)
        await tb.input_text(password)
        await tb.tap(text="Submit")

        # Check outcome without throwing
        if await tb.is_visible(text="Dashboard", timeout_ms=5000):
            return {"success": True, "attempts": attempt + 1}

        if await tb.is_visible(text="Invalid credentials"):
            # Wrong password, no point retrying
            return {"success": False, "error": "invalid_credentials"}

        # Network error or slow response, try again

    return {"success": False, "error": "max_attempts_exceeded"}

Key Difference: Queries vs Assertions¶

# QUERY - returns False, doesn't throw
visible = await tb.is_visible(text="Login Button")  # → False

# ASSERTION - throws/fails if not visible
await tb.assert_visible(text="Login Button")  # → raises AssertionError

Use queries when you need to branch based on screen state. Use assertions when you’re verifying expected state (and want the test to fail if wrong).

Error Handling¶

Errors propagate through standard mechanisms at each layer:

Layer	Error Handling
Tool execution	Return failure result with error message
MCP protocol	JSON-RPC error responses
RPC (gRPC/Connect)	Status codes + error details

Tools should return meaningful error messages. Protocol-level errors (connection failures, timeouts) are handled by the respective transports. No custom error handling infrastructure required.

FAQ¶

Can external MCP servers make LLM requests through Trailblaze?¶

Future: Yes, via MCP Sampling.

The MCP protocol includes a Sampling feature that allows servers to request LLM completions from clients. This enables external tools to “borrow” Trailblaze’s configured LLM for tasks like:

“Is the screen showing an error message?”
“What color is the button?”
“Does this screenshot match the expected state?”

Why MCP Sampling instead of an RPC endpoint?

Approach	Who Can Use It	Control
MCP Sampling	Only servers Trailblaze connects to	Per-connection opt-in
RPC `askLlm()` endpoint	Anyone who connects to Trailblaze	Globally visible

MCP Sampling is preferred because: 1. Trailblaze initiates the connection — you choose which servers can sample 2. Per-server opt-in — enable sampling only for trusted servers 3. Invocation context flows naturally — same pattern as other RPC calls 4. Protocol-compliant — standard MCP, not a custom API

Current status: Not implemented. The invocation context infrastructure (invocation ID, context propagation) provides the foundation. When implemented:

@mcp.tool()
async def verify_screen_color(expected_color: str, ctx: Context) -> dict:
    tb = TrailblazeClient.from_context(ctx)

    # Request LLM completion via MCP Sampling
    result = await tb.sampling.create_message(
        messages=[{"role": "user", "content": f"Is the button {expected_color}?"}],
        include_screenshot=True,
    )

    return {"matches": "yes" in result.content.lower()}

Note: Most MCP clients (Cursor, Claude Desktop) don’t support sampling. This feature is for external MCP servers calling back to Trailblaze, not for clients calling Trailblaze.

010: Custom Tool Authoring - Previous decision this supersedes
008: Trailblaze MCP - MCP server architecture
032: Trail/Blaze Agent Architecture - How tools fit in agent architecture

References¶

MCP Protocol - Tool discovery and invocation
Wire - Kotlin-first proto library by Block
Protocol Buffers - API definition
gRPC - RPC framework
Connect - Modern proto-based RPC

Trailblaze Decision 029: Custom Tool Architecture¶

Context¶

Requirements¶

Decision¶

Architecture Overview¶

Wire/Proto: Kotlin-First API Definition¶

Code Generation¶

Deployment Paths¶

Path 1: Host-Driven (MCP + RPC)¶

Path 2: On-Device (Library + Custom APK)¶

Path 3: Host-Driven Kotlin (In-Process)¶

MCP Server Organization¶

Composable Tool Modules¶

Deployment Options¶

Trade-offs¶

Recommendation: Multiple Servers (with Build Considerations)¶

Tool Namespacing (Required)¶

Recommendation: One Combined Server¶

Registry with One Combined Server¶

External Teams: Flexible Deployment¶

Multiple MCP Servers is Standard¶

MCP Server Registration¶

Transport Model: stdio vs HTTP¶

Project Configuration (trailblaze.yaml)¶

Config Schema¶

Why stdio Avoids Port Problems¶

STDIO Concurrency Limitation¶

Recommended: HTTP Transport for Multi-Device¶

Single Device: STDIO Still Works¶

For External MCP Clients (Goose, Cursor, Claude Desktop)¶

In-Process (Same JVM)¶

Multi-Device Execution¶

How It Works¶

Device Context Propagation¶

Wire API Includes Device Context¶

Tool Perspective¶

Invocation Context for Multi-Device¶

Metadata Shape¶

Single Device Fallback¶

Static vs Fresh Data¶

Invocation ID Lifecycle¶

Error Handling¶

Tool Registry¶

Central Registry Per Server¶

Registry Data Model (Proto-Generated)¶

Exposed via MCP Resource¶

Separation of Concerns¶

Using the Registry¶

Tool Types¶

ExecutableTrailblazeTool¶

DelegatingTrailblazeTool and Recording¶

Registry Flags Explained¶

Why Tools Are Non-Recordable¶

Common Combinations¶

Replay Modes¶

The Delegating Pattern¶

Recording Flow¶

For External MCP Tools¶

Core Recording Principle¶

How Delegation Works¶

The Delegation Flow¶

Read-Only vs Action Operations¶

Nested Delegation¶

Why This Design¶

Response Fields¶

Tool Composition¶

Dynamic Tool Reload¶

Cross-Platform Tools¶

Type Safety and Refactoring¶

Wire/Proto as Contract¶

Refactoring Support¶

What We Maintain vs Don’t¶

Consequences¶

Internal Validation¶

Validating the SDK Pattern¶

Same Code, Multiple Execution Modes¶

New Tool Definition: Kotlin MCP SDK¶

Tool Authoring by Language¶

Kotlin (Recommended for In-Process)¶

Python / TypeScript (External Teams)¶

Project Configuration (`trailblaze.yaml`)¶