Unit-testing scripted tools without a device¶
Update (2026-05-22, PR #3283): the standalone
trailblaze testsubcommand described below was deleted in favor of folding the bun-test phase intotrailblaze check(now a three-phase command: materialize → tsc → bun-test). The@trailblaze/scripting/testingSDK subpath and the per-trailmap*.test.tsdiscovery shape are unchanged. Invoke as./trailblaze check <trailmap>(or--all) instead of./trailblaze test <trailmap>. The “Test runner” section below describes the original CLI surface; everything else still applies.
What landed¶
*.test.ts files next to a .ts scripted tool now run via ./trailblaze test <trailmap> —
no daemon, no device, no MCP roundtrip. The deliverable is three coupled pieces:
-
@trailblaze/scripting/testingSDK subpath. Newsdks/typescript/src/testing.tsexportscreateMockClient()(records everyclient.callTool(name, args)and lets a test register a canned response viaclient.stub(toolName, { textContent, errorMessage })—errorMessagenon-empty makes the call throw the same shape the real client throws on a tool-side failure) andcreateMockContext({ platform, target, memory })(returns aTrailblazeContextwith a no-op logger and explicit target / memory shape, no on-device runtime needed). -
Test runner.
trailblaze test [<trailmap-id>|--all]walks the trailmap’stools/tree for*.test.tsfiles and shells out tobun test. Trailmap discovery mirrorstypecheck(walk-up from caller cwd, or--all, or explicit trailmap id); the subprocess timeout is governed byTRAILBLAZE_TEST_TIMEOUT_MSwith a 5-minute default and a 1-minute lower clamp. -
Runtime resolution path. The SDK ships both
dist/testing.d.ts(for tsc) ANDdist/testing.js(for bun’s runtime resolution via the per-trailmap tsconfigpathsmapping). The.jsis a plain esbuild transpile ofsrc/testing.ts, not a bundle — testing.ts has zero runtime imports from the rest of the SDK (only type-only imports), so the transpiled output is self-contained.TrailblazeSdkDtsBundlePluginnow generates and byte-verifies all three artifacts (index.d.ts,testing.d.ts,testing.js) in lockstep.
The canonical sample test sits next to a real tool:
examples/playwright-native/.../playwrightSample_web_openFixtureAndVerifyText.test.ts
(rooted at the OSS tree) asserts the tool dispatches web_navigate then
web_verifyTextVisible with the expected args, and that module defaults apply when
args are omitted.
Why a separate test subcommand instead of folding into typecheck¶
The earlier sketch had trailblaze typecheck also run bun test after tsc passed —
“one command, one answer.” That conflates two failure modes the author cares about
separately. A tool body can be type-clean and logically broken (test catches it,
typecheck doesn’t); an author iterating on test assertions doesn’t want to pay tsc’s
setup cost on every bun test run. Keeping them split also matches how bun test
itself works — no implicit tsc step.
The cost is two commands instead of one. Mitigated by parallel CLI shapes (same
walk-up + --all discovery, same exit-code conventions) so muscle memory carries
over.
Why ship a transpiled .js instead of the .ts source¶
The first attempt was to ship dist/testing.ts (the raw source) directly. Bun
would resolve @trailblaze/scripting/testing via tsconfig paths and execute the
.ts. That works at runtime, but tsc’s module resolution under
moduleResolution: "Bundler" tries extensions in order .ts → .tsx → .d.ts → .js,
so dist/testing.ts would shadow dist/testing.d.ts for type-checking and tsc would
try (and fail) to resolve the source’s ./client.js / ./context.js imports against
a dist directory that doesn’t ship those files.
Shipping .js instead means tsc sees .d.ts (correct types) and bun sees .js
(executable runtime), both reachable from the same paths entry with no ambiguity.
The byte-diff gate in verifyTrailblazeSdkDtsBundle extends to the .js too — a
hand-edit to testing.js triggers the same CI failure path as a stale .d.ts.
Why testing.ts has no runtime SDK imports¶
Originally testing.ts imported noopLogger from ./logger.js. That made
testing.ts non-self-contained: any consumer of the runtime .js would need the
sibling SDK files reachable too. Inlining a 4-line mockNoopLogger (debug / info
/ warn / error no-ops typed as TrailblazeLogger) removes the entanglement
entirely.
Beyond the runtime resolution win, this is good test-isolation hygiene — a regression
in the production logger or client can’t reach into a mock by construction. The cost
is that adding a method to TrailblazeLogger requires updating mockNoopLogger in
lockstep; the bundled .d.ts would still type-check OK against a stale mock, so this
is the kind of drift a *.test.ts file would catch first (a test that calls
ctx.logger.trace(...) would TypeError at runtime).
What needed baseUrl and what didn’t¶
The per-trailmap tsconfig the framework already emits maps @trailblaze/scripting/* to a
../../../../.trailblaze/sdk/dist/* glob path. Confirmed via bun’s actual resolution
that ../-prefixed paths resolve fine WITHOUT baseUrl, but unprefixed relative
paths (sdk2/*) silently fail to resolve. The production emitter always uses
../-prefixed paths (the trailmap lives inside the workspace; the SDK lives at the
workspace root), so no change to PerTrailmapTsconfigEmitter was needed — but if a
future change makes that emitter resolve the SDK to a sibling-relative path, expect
to add baseUrl: "." to keep bun’s resolution happy.
Out of scope (intentionally)¶
.js-authored test files. The runner discovery glob is*.test.tsonly. A trailmap that authors tools in.jscan still write tests in.ts— the import surface is type-checked either way. Adding*.test.jswould require deciding whatallowJsbehavior we want for tests, separate from what we want for tool source.- Watch mode.
bun test --watchis a single flag away; we deliberately don’t expose it throughtrailblaze testyet because the CLI loop for the daemon is what runs in CI, and watch mode is local-dev-only ergonomics. Add when an author actually asks. - Coverage. Same reasoning —
bun test --coverageworks fine if invoked directly from inside a trailmap’stools/dir, but isn’t wired through thetrailblaze testflag surface yet.