Scripting SDK — client.callTool Round-Trip¶
This lights up the callback round-trip from TypeScript back into Trailblaze. The initial SDK landing already shipped the Kotlin-side endpoint, the envelope injection, and the authoring surface; this landing adds the matching TS client and the daemon-side guardrails (envelope-migration devlog flagged four concerns for the callback landing to pin down).
Scope¶
client: TrailblazeClientbecomes the third handler argument ontrailblaze.tool(name, spec, handler). Exposes a single method,callTool(name, args): Promise<CallToolResult>.- Hand-written TS types mirror the Kotlin
@Serializableshapes inJsScriptingCallbackContract.kt. No codegen (D2 stays locked — proto deferred until action surface growth or a second language consumer justifies it). - Daemon-side timeout + reentrance-depth cap enforced on
/scripting/callback. Both have tests. - Cached-vs-fresh
screenStatebehaviour documented. - One SDK-authored sample tool (
signUpNewUserSdk) composesgenerateTestUserSdkvia a callback end-to-end; CI exercises it.
Out of scope:
- Typed wrappers (
client.tap(...),client.inputText(...)). A future landing is locked as “deferred indefinitely” — the untypedcallTool(name, args)surface is the lowest-common-denominator shape that works identically for Python / QuickJS consumers later. - Proto / Wire / buf codegen — stays deferred.
- Cross-machine callbacks — the endpoint’s loopback gate stays as-is.
- SDK registry publishing —
"private": truestays for now.
Design concerns pinned in this PR¶
1. Callback-dispatch timeout (daemon-side)¶
Decision: 30 s default, overridable via the
-Dtrailblaze.callback.timeoutMs=<ms> JVM system property.
Read at endpoint-register time. The endpoint wraps
tool.execute(entry.executionContext) in withTimeout(...); a
TimeoutCancellationException converts to CallToolResult(success =
false, errorMessage = "Callback timed out after ${N}ms") instead of
letting the coroutine tree hang.
Why 30 s and not something tighter: a legitimate callback can dispatch
a tapOnElementWithText that waits for the UI to settle, which on
flaky emulators already eats 10–15 s. 30 s is generous enough that a
real tool won’t hit it while still bounding a pathological hang into
something the outer agent timeout absorbs. Tests override via a lower
constant to keep the test runtime sane.
Why a system property and not a register parameter: every register
call site would otherwise have to plumb the value through. The
property keeps the contract easy to override in CI (-D...) and in
tests (System.setProperty) without touching the public
register(routing: Routing) shape. An env var was considered and
rejected — tests can’t set env vars in-JVM on most platforms.
2. Reentrance depth cap¶
Decision: cap at 16 per chain, matching Decision 038’s script execution limit.
Tracked on JsScriptingInvocationRegistry.Entry.depth. The outer
invocation (registered from SubprocessTrailblazeTool.execute with no
parent) gets depth 0. When the callback endpoint dispatches a tool,
the depth-1 environment is propagated via a
JsScriptingCallbackDispatchDepth CoroutineContext element; if the dispatched
tool is itself a SubprocessTrailblazeTool that registers its own
invocation, it reads the element and stores depth = parent + 1.
Before dispatching, the endpoint rejects any callback whose resolved
entry.depth >= MAX_CALLBACK_DEPTH with
CallToolResult(success=false, errorMessage = "Callback reentrance
depth ${d} exceeds max ${MAX}"). That caps a buggy
signUpNewUserSdk → generateTestUserSdk → signUpNewUserSdk loop at 16
steps before the daemon refuses, long before the outer agent timeout
or stack exhaustion would fire.
Why 16 and not a smaller number: Decision 038’s execution-model devlog uses 16 for in-process QuickJS callbacks and the rationale (a realistic composable tool rarely exceeds 3–4 levels; 16 is 4× over-provisioned against that) is identical here. Keeping the two caps the same means authors who move a tool between runtimes don’t hit a surprise.
Why a depth cap and not a timeout-only approach: a fast-looping reentrance cycle can burn thousands of iterations per second inside the 30 s timeout before it fires. The depth cap surfaces the runaway as an actionable error immediately, not after 30 s of log spam.
3. Cached-vs-fresh screenState on callbacks¶
Decision: cached. The callback dispatches against the same
TrailblazeToolExecutionContext (and therefore the same cached
screenState) the outer handler is working with.
Matches the outer handler’s observation of the screen at the moment
of the subprocess tools/call — predictable, and composition chains
like “tap this → input that → tap again” already interleave their own
tool-side re-captures. No implementation change is needed for this
decision (the endpoint already passes entry.executionContext
unchanged); documenting it here so a future author doesn’t assume a
fresh capture and get bit.
Fresh captures can be added later via an explicit
trailblaze/refreshScreenState tool authors opt into. Not in this PR.
4. Thread safety¶
No change required. The endpoint already dispatches against the
entry.executionContext, which owns its own memory + screen-state
mutability semantics from the outer dispatch. Reentrance depth is
propagated via CoroutineContext (structured, scoped to the dispatch
coroutine) rather than thread-local state.
TS TrailblazeClient surface¶
trailblaze.tool(
"signUpNewUserSdk",
{ description: "..." },
async (_args, ctx, client) => {
const raw = await client.callTool("generateTestUserSdk", {});
const user = JSON.parse(raw.textContent) as { name: string; email: string };
// ... compose more callTool invocations ...
return { content: [{ type: "text", text: `Signed up ${user.email}` }] };
},
);
client is always passed (never undefined even when ctx is — an
undefined ctx means “no envelope”, so a callTool will fail the
precondition check and throw a clear error). Authors that never
callback just don’t reference the arg; the shape stays backward
compatible with handlers from the initial landing because TypeScript
lets handlers ignore trailing parameters.
The client.callTool(...) method:
- Reads
baseUrl,sessionId,invocationIdfrom the envelope-capturing closure in the SDK (captured at handler-entry via the sameextractMetapath thectxargument uses). - POSTs JSON to
${baseUrl}/scripting/callbackwith theJsScriptingCallbackRequestshape. 30 s fetchAbortSignalon the client side matches the daemon’s default so a hung endpoint doesn’t leak. - Parses the response. On
JsScriptingCallbackResult.Errorthrows a plainError(message). OnJsScriptingCallbackResult.CallToolResult(success: false, ...)also throws (tool-level failure — subprocess authors almost always want this to bubble). Onsuccess: truereturns the{ success, textContent, errorMessage }record so authors can read the tool’s text output directly.
Error semantics¶
Two failure modes map onto thrown JS errors:
- Protocol errors (unknown invocation id, session mismatch,
malformed body, unsupported version) →
JsScriptingCallbackResult.Errorfrom the daemon. Client throws with the daemon’s message. - Tool execution failures (deserialization failure, tool threw,
TrailblazeToolResult.Error.*) →JsScriptingCallbackResult.CallToolResult( success = false, errorMessage = "..."). Client throws witherrorMessageso authors don’t have to branch on a.successflag they’ll forget about.
Success returns the CallToolResult record; the author reads
result.textContent and parses it however the tool it called
produces output. No typed wrapper on purpose (see scope).
References¶
- Envelope migration & callback transport — devlog
- Authoring pitch — authoring vision & roadmap
- Execution model (reentrance cap lineage) — scripted tools execution model
- Synchronous tool execution from JS — devlog