@trailblaze/scripting — Authoring Vision & Roadmap¶
Companion devlog aimed at TypeScript authors who’ll be writing custom Trailblaze tools. Where the companion devlogs focus on decisions (envelope shape, proto-vs-JSON, runtime mechanics), this one captures the author experience — what you write today, what you’ll write tomorrow, and where we’re headed. Review target: give TS-literate reviewers enough to push back on the API shape before we commit harder.
The one-paragraph pitch¶
Trailblaze is an AI-driven UI-testing framework where the LLM calls
“tools” to drive the app (tap, type, assert, remember-this). Authors
have always been able to add their own tools. Today that means writing
Kotlin. We’re adding a TypeScript authoring surface so anyone who can
spin up an MCP server can ship custom tools — no Kotlin, no Gradle, no
JVM round-trip. @trailblaze/scripting is a thin wrapper around
@modelcontextprotocol/sdk that hides the protocol ceremony, exposes
Trailblaze’s device/session context as a typed object, and (landing in
the next PR) lets your tool call back into Trailblaze’s own primitives
to compose higher-level behaviour.
What authors write today (raw MCP SDK)¶
Committed reference: examples/android-sample-app/trailblaze-config/mcp/tools.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
const server = new McpServer(
{ name: "sample-app-mcp-tools", version: "0.1.0" },
{ capabilities: { tools: {} } },
);
server.registerTool(
"generateTestUser",
{ description: "...", inputSchema: {} },
async () => ({
content: [{ type: "text", text: JSON.stringify({ name: "Sam", email: "sam@example.com" }) }],
isError: false,
}),
);
// ...repeat for every tool...
const transport = new StdioServerTransport();
await server.connect(transport);
Four MCP imports, three pieces of server-construction ceremony, and
you’re on the hook for stdio transport wiring per file. More importantly:
the handler has no access to Trailblaze state. Device platform,
screen dimensions, agent memory, the session id — none of it is
reachable from inside the tool. If you need them, you can dig them out
of process.env (where we set a handful of TRAILBLAZE_* vars at
spawn), but it’s off-contract and inconvenient.
What authors will write with @trailblaze/scripting¶
Committed reference: examples/android-sample-app/trailblaze-config/mcp-sdk/tools.ts
import { trailblaze } from "@trailblaze/scripting";
trailblaze.tool(
"generateTestUser",
{ description: "..." },
async (_args, ctx) => {
// ctx is TrailblazeContext | undefined
// ctx.device.platform "ios" | "android" | "web"
// ctx.device.driverType, .widthPixels, .heightPixels
// ctx.memory agent memory as Record<string, unknown>
// ctx.sessionId opaque session id for log correlation
// ctx.invocationId per-call id — forward on callbacks
// ctx.baseUrl daemon URL for the callback channel
return {
content: [{ type: "text", text: JSON.stringify({ name: "Sam", email: "sam@example.com" }) }],
};
},
);
await trailblaze.run();
One import. One call per tool. ctx is typed. await trailblaze.run()
replaces the server + transport boilerplate.
The two reference files sit side-by-side in the sample app and expose
the same two tools under suffixed names (generateTestUser /
generateTestUserSdk) so CI exercises both authoring surfaces in
parallel and we notice the moment either one drifts.
The TrailblazeContext envelope¶
Injected by the host on every tools/call under
_meta.trailblaze. Shape is locked in the
envelope-migration devlog:
type TrailblazeContext = {
baseUrl: string; // daemon HTTP base URL
sessionId: string; // opaque; log correlation only
invocationId: string; // per-tool-call; forward on callbacks
device: {
platform: "ios" | "android" | "web";
widthPixels: number;
heightPixels: number;
driverType: string; // e.g. "android-ondevice-accessibility"
};
memory: Record<string, unknown>; // string-valued today; typed `unknown` for forward-compat
};
undefined when the tool was invoked outside a Trailblaze session
(ad-hoc MCP client, unit test) — your handler decides whether to
degrade gracefully or refuse.
There’s also a legacy arg envelope (_trailblazeContext inside
arguments) that pre-dates the SDK and still works; the raw-SDK example
reads it. New tools should only read via ctx. Both envelopes are
injected in parallel during the migration window.
What the next landing unlocks¶
This landing ships the authoring surface — tools declare themselves, register, and execute. They have typed access to context but can’t call back into Trailblaze. A follow-up lights up that second half.
The callback architecture inherits directly from two prior design threads:
- Decision 029 — Custom Tool Architecture
specified the RPC path: proto-typed commands over HTTP, with authors
reaching a Trailblaze daemon endpoint discovered via
baseUrl/trailblazeInvocationIdon the envelope. Our callback endpoint (/scripting/callback) is that endpoint, JSON-first instead of proto (see D2 in the envelope-migration devlog). - Synchronous Tool Execution from JS
specified the semantics — synchronous
trailblaze.execute(toolName, params)returning aTrailblazeToolResult, reentrance, recording behaviour, error variants as JS objects. That work was originally aimed at QuickJS (in-process); we’re taking the same shape and putting it on HTTP for the subprocess path. Same author-facing contract, different transport.
The next PR’s surface:
import { trailblaze } from "@trailblaze/scripting";
trailblaze.tool(
"signUpNewUser",
{ description: "Creates a fresh account and signs in." },
async (_args, ctx, client) => {
// client.callTool(name, args) hits the daemon's /scripting/callback endpoint,
// which deserializes via toolRepo.toolCallToTrailblazeTool(name, argsJson) and
// executes against the live session's agent.
const user = await client.callTool("generateTestUser", {});
await client.callTool("tapOnElementWithText", { text: "Name field" });
await client.callTool("inputText", { text: user.name });
await client.callTool("tapOnElementWithText", { text: "Sign up" });
return { content: [{ type: "text", text: `Signed up as ${user.email}` }] };
},
);
await trailblaze.run();
The Kotlin side is already in place: /scripting/callback validates the
invocationId, resolves the live TrailblazeToolRepo + execution
context, dispatches, returns the result as JSON. The callback landing is
a pure TS change — no daemon changes, no proto.
What the callback unlocks in practice¶
Once a subprocess can call back into Trailblaze, it can do anything a Kotlin-authored tool can do — because it’s dispatching the same tools through the same repo. The patterns this enables are described end-to-end in the execution-model devlog, which was originally written for the in-process QuickJS path but applies identically here:
- Query view hierarchy / visibility.
await client.callTool("assertVisibleWithText", { text: "Login" })and branch onsuccess: true/false. Same informationtrailblaze.isVisible(...)would have surfaced in the QuickJS vision — no need for a separate query API because every Trailblaze tool’s result is already the answer. - Read agent memory (already live via
ctx.memory). Write via callback intorememberTextor any Trailblaze tool that writes memory. - Try-then-fallback composition. Call a primitive, inspect
success, branch. The original Decision 025 vision was emit-only and couldn’t do this; callbacks restore the observability that vision was missing. - Polling / stabilization loops. The execution-model devlog uses the API-29 “wait for main screen + stay stable for N checks” pattern as the motivating example — 80 lines of Kotlin today, ~15 lines of TS with callbacks.
- Custom assertions. Compose whatever branching logic you want on top of the primitive tools, still recorded for deterministic replay.
The sync-execute devlog’s design concerns (reentrance caps, timeout discipline, recording semantics, thread safety) apply here too — flagged in the envelope-migration devlog as things the callback landing has to pin down.
Design question we’d love feedback on¶
Should the TS surface ever get typed command wrappers?
A future option is to generate client.tap({ text: "..." }),
client.inputText(...), etc. from the Kotlin tool catalog so authors
get IDE completion instead of “call callTool('tapOnElementWithText',
...) and hope the name hasn’t changed.” We’re leaning no: name +
JSON args is the lowest-common-denominator surface, it evolves without
forcing SDK re-publishes every time a tool changes, and it works
identically for Python / on-device QuickJS consumers later.
The original
Decision 038 execution-model devlog
flagged PR B (typed query ergonomics) as “optional polish” for exactly
this reason — everything expressible via
client.callTool("assertVisibleWithText", { text }).success === true
is already there; typed wrappers are about readability, not capability.
The
earlier commands.proto prototype
tried typed wrappers and what we’d be reviving is essentially that
idea’s codegen pipeline.
If your team has a strong opinion here — ergonomics win vs. catalog drift risk — this is the load-bearing place to push back before that option lands.
Where we’re heading¶
| Milestone | What |
|---|---|
| This landing | Authoring surface + callback endpoint. Tools declare, register, execute. No composition yet. |
| Callback landing | client.callTool(name, args) on the TS side. Tools compose other Trailblaze tools. |
| Typed-commands decision | Decision: typed commands or stay at callTool(name, args). (Current lean: stay untyped.) |
| Later | Python SDK with the same envelope shape. Same HTTP callback endpoint. |
| Later still | On-device QuickJS bundle of the same .ts source — same authoring code, different transport. |
The authoring surface and the envelope contract are deliberately
runtime-agnostic: the same tools.ts should compile for both the
host-subprocess mode (today) and the on-device QuickJS mode (later)
without code changes. That constraint is why callbacks go over HTTP +
JSON instead of MCP-as-transport — an MCP-over-stdio callback would be
a different protocol than the HTTP-to-daemon callback QuickJS-on-device
will need to issue.
Authoring workflow today¶
- Install —
cd <your-target>/trailblaze-config/mcp-sdk && bun install(ornpm install). The SDK is consumed today via a localfile:link in the sample example; npm-registry publish is a follow-up. - Author —
tools.tsin that directory, using@trailblaze/scripting. - Wire — add an entry under
mcp_servers:in the target YAML:mcp_servers: - script: ./mcp-sdk/tools.ts - Run — Trailblaze spawns the file as a bun/node subprocess at session start, registers every tool it advertises, and includes them in the LLM’s tool catalog for that session.
No Kotlin, no Gradle, no JVM in your author loop. If bun isn’t on your
PATH, Trailblaze falls back to node + tsx.
What we want from web-team review¶
Specific questions:
trailblaze.tool()API shape. Is(name, spec, handler)the right shape, or do you prefertrailblaze.tool({ name, description, handler })as an options object? We picked positional to matchserver.registerTool.ctxas second handler arg, or options-bag? Second arg was chosen for destructure-friendliness (async (_args, { device, memory })).- Handler return type. Today we mirror the MCP SDK’s shape
(
{ content: [{ type: "text", text }], isError? }). Worth wrapping? Concern: every wrapper is another thing to maintain on upgrade. - Typed commands. Worth the codegen cost or stay at
callTool(name, args)? - Error handling. Thrown errors →
isError: trueMCP result, or let the subprocess crash? Today the SDK lets the MCP SDK’s default behavior stand. - Memory writes. Today memory is read-only from
ctx.memory. Authors who want to remember something call back into a Trailblaze tool that writes memory. Ergonomically right, or shouldctx.memoryexposeset? - Publishing. Private
file:link today. What’s the right publishing path — GitHub Package Registry? npm public with a@trailblaze/scope? A private registry? - TypeScript tooling. Sample tsconfig uses
Node16module resolution + strict mode. Any standards we should align with up-front?
References¶
MCP-based tools (what lands how)¶
- Direction doc: Scripted Tools PR A3 — MCP SDK Subprocess Toolsets — the what (Option-2 amendment: subprocess MCP, not QuickJS-first)
- Scope for PR A3 (what shipped before this SDK landing): Host-Side Subprocess MCP Toolsets (Scope) —
mcp_servers:YAML, spawn, handshake, tool registration, env-var contract - Subprocess lifecycle + registration details: Scripted Tools PR A3 Phase 1
- MCP conventions (
_meta["trailblaze/*"]keys, result shape, naming): Scripted Tools — MCP Extension Conventions - Forward-looking integration patterns (Tier 1 first-party vs Tier 2 third-party servers, toolset-level
mcp_servers:, metadata overlays): MCP Server Integration Patterns - Toolset consolidation (how host-subprocess + on-device bundle split): Scripted Tools — Toolset Consolidation & Revised Sequencing
SDK callback / JSON-RPC / proto (the “call back into Trailblaze” thread)¶
- Foundational RPC architecture (HTTP proto-JSON,
baseUrl+trailblazeInvocationId, the pattern this SDK inherits): Decision 029: Custom Tool Architecture - Envelope + callback contract (this landing’s decisions — D1 dual-write envelope, D2 JSON-not-proto): Scripting SDK — Envelope Migration & Callback Transport
- Synchronous execute semantics (originally for QuickJS; same contract we’re delivering via HTTP): Scripted Tools PR A2 — Synchronous Tool Execution from JS
Query view hierarchy / memory / execute-tools — “observe and react” patterns¶
- Execution-model master plan (scripted tools,
trailblaze.execute(), typed queries PR B, reentrance + timeout design): Scripted Tools Execution Model (QuickJS + Synchronous Host Bridge) - Original scripted-tools vision (Decision 025, where memory + emit started): Scripted Tools Vision
- On-device QuickJS bundle (how the same
.tssource will run on-device later): Scripted Tools PR A5 — MCP Toolsets Bundled for On-Device
Complementary authoring paths¶
- YAML-defined tools (static composition; scripts complement, don’t replace): Decision 037: YAML-Defined Tools
Appendix: sample-app side-by-side¶
Both files implement generateTestUser and currentEpochMillis. The SDK
file suffixes them (...Sdk) so both paths register side-by-side in the
session without colliding. Read them together to see the diff the SDK
makes:
- Raw MCP SDK:
examples/android-sample-app/trailblaze-config/mcp/tools.ts - SDK:
examples/android-sample-app/trailblaze-config/mcp-sdk/tools.ts
CI exercises both end-to-end via SampleAppMcpToolsTest and
SampleAppMcpSdkToolsTest so drift between the two authoring surfaces
surfaces immediately.