Trailblaze Plan 037: CLI-Based SSO/Auth and Dynamic On-Device Instrumentation Args¶
Status¶
- Part 1 (Dynamic Instrumentation Args): Complete as of 2026-04-07.
- Part 2 (CLI-Based SSO/Auth): Future work — direction and design documented below but not yet implemented.
Summary¶
Two related problems: (1) SSO/OAuth for LLM providers is hardcoded to Databricks, and (2) on-device Android LLM calls use hardcoded env var names that don’t scale to arbitrary YAML-configured providers. Part 1 introduced a convention-based instrumentation arg scheme for on-device token passing. Part 2 proposes shell-out token commands for auth.
Context¶
Problem 1: SSO diversity¶
Organization-specific OAuth implementations (e.g., a custom OAuth client and JVM token provider for a corp LLM gateway) are hardcoded and don’t generalize. Different organizations use different SSO/OAuth/SAML flows. We can’t model every auth flow variant in the framework.
Problem 2: Hardcoded on-device instrumentation args (resolved by Part 1)¶
Previously, the host side wrote API keys using hardcoded env var names (DATABRICKS_TOKEN, OPENAI_API_KEY) and the Android side had a hardcoded when(provider) block. This is now resolved — LlmAuthResolver reads auth.env_var from the YAML config and uses the dynamic trailblaze.llm.auth.token.<provider_id> convention. The Android side reads tokens generically via AndroidLlmClientResolver.
Decision¶
Part 1: Dynamic Instrumentation Args for On-Device LLM¶
Replace hardcoded env var names with a convention keyed by provider ID:
trailblaze.llm.auth.token.<provider_id> = <token>
Host side writes:
// In LlmAuthResolver.toInstrumentationArgs()
for ((providerId, auth) in auths) {
val token = auth.token ?: continue
put("trailblaze.llm.auth.token.$providerId", token)
}
// For the selected provider, also pass connection info
selectedAuth?.providerConfig?.let { config ->
config.type?.let { put("trailblaze.llm.provider.type", it.name.lowercase()) }
config.baseUrl?.let { put("trailblaze.llm.provider.base_url", it) }
config.chatCompletionsPath?.let { put("trailblaze.llm.provider.chat_completions_path", it) }
}
Android side reads generically:
fun getTokenForProvider(providerId: String): String? =
instrumentationArgs.getString("trailblaze.llm.auth.token.$providerId")
fun getProviderType(): String? =
instrumentationArgs.getString("trailblaze.llm.provider.type")
Android client construction becomes generic — uses provider type + base URL + path from args instead of a hardcoded when(provider) block.
Benefits¶
- No hardcoded env var names on the Android side
- Any YAML-configured provider works on-device automatically
- Provider ID is the single key — fully dynamic
- Host-side env var mapping stays (for CI/desktop), doesn’t leak into the device
Implementation steps (all complete)¶
- ~~Update
LlmAuthResolver.toInstrumentationArgs()to usetrailblaze.llm.auth.token.<id>convention~~ — Done - ~~Keep writing legacy provider-specific args temporarily for backward compat~~ — Skipped; migrated all consumers directly
- ~~Update Android token provider to read dynamic args~~ — Done (uses
LlmAuthResolver.resolve()) - ~~Replace hardcoded
when(provider)in Android test rule with generic client construction~~ — Done. UsesAndroidLlmClientResolver.resolveModel()for model resolution and genericOpenAILLMClientconstruction from instrumentation args (base_url,chat_completions_path). - ~~Update
scripts/atf.sh~~ — Done (usestrailblaze.llm.auth.token.<id>convention) - ~~Audit other CI scripts for hardcoded arg references~~ — Done (scripts set env vars;
atf.shconverts to new arg format) - ~~Remove legacy arg writes~~ — Complete; no legacy arg writes remain in active code paths
Part 2: CLI-Based SSO/Auth for LLM Providers (Future Work)¶
Instead of implementing OAuth flows in the framework, let users specify CLI commands that produce tokens. The framework handles caching and lifecycle; the user’s tooling handles auth complexity. This part is not yet implemented — the design below captures the intended direction.
YAML schema extension¶
providers:
my-corp-llm:
type: openai_compatible
base_url: https://llm.corp.example.com
auth:
# Priority: env_var > cached token > token_command
env_var: CORP_LLM_TOKEN # CI/manual override (highest priority)
token_command: "corp-auth get-token" # shell out, stdout = token
refresh_command: "corp-auth refresh" # optional, proactive refresh
token_cache: ~/.trailblaze/tokens/corp-llm.json
token_ttl: 3600 # optional hint if token doesn't self-describe expiry
Token resolution priority (per provider)¶
- Environment variable (
env_var) — always wins, critical for CI - Cached token — check
token_cachefile, use if not expired - Refresh command (
refresh_command) — if cached token is close to expiry - Token command (
token_command) — full auth flow (may open browser, prompt user)
Token cache format¶
{
"token": "eyJ...",
"expires_at": 1720000000,
"refresh_token": "optional...",
"metadata": {}
}
The framework reads/writes this file. The token_command can also write it directly.
Command contract¶
token_command:
- Runs when no valid cached token exists
- Does whatever it needs (browser OAuth, CLI login, keychain lookup)
- Stdout = token (bare string) or JSON {"token": "...", "expires_at": ...}
- Non-zero exit = auth failure
- May be interactive (opens browser, prompts user)
refresh_command (optional):
- Runs proactively before token expiry
- Same output contract as token_command
- Should be non-interactive (silent refresh)
- Falls back to token_command on failure
Example: corp LLM gateway migration¶
Existing organization-specific OAuth code becomes a CLI command behind the same interface:
corp-llm:
type: openai_compatible
base_url: https://llm-gateway.corp.example.com
auth:
env_var: CORP_LLM_TOKEN
token_command: "trailblaze auth corp-llm"
token_cache: ~/.trailblaze/tokens/corp-llm.json
The custom OAuth client moves into a trailblaze auth <provider> command. The JVM token provider simplifies to just reading YAML config.
Implementation steps (TODO)¶
- Extend
LlmAuthConfigwithtoken_command,refresh_command,token_cache,token_ttlfields - Generalize token cache into a generic implementation (read/write JSON token files, expiry checking)
- Create
CommandTokenProvider(execute commands viaProcessBuilder, parse stdout, write to cache) - Wire into
LlmAuthResolver(env_var > cache > command priority chain) - Migrate existing OAuth flows behind
trailblaze auth <provider>CLI commands - Desktop app picks this up automatically via existing JVM path
What This Enables¶
- Any SSO provider (Okta, Azure AD, Google Workspace, custom SAML)
- Corporate proxy auth
- Hardware token / MFA flows
- Keychain-based token retrieval (macOS Keychain, Linux secret-service)
- Cloud provider CLI auth (
gcloud auth print-access-token,aws sso get-role-credentials) - On-device LLM calls with any YAML-configured provider
- No hardcoded provider knowledge on the Android side
Relationship to Existing Code¶
| Component | Part 1 Status (Done) | Part 2 Status (Future) |
|---|---|---|
LlmAuthResolver |
Uses trailblaze.llm.auth.token.<id> convention |
Will gain CommandTokenProvider path |
| Android token provider | Reads dynamic args via LlmAuthResolver.resolve() |
No change needed |
| Android test rule | Generic client construction via AndroidLlmClientResolver + instrumentation args |
No change needed |
| Custom OAuth client | Unchanged | Will move behind trailblaze auth <provider> CLI command |
| Token cache | Unchanged | Will generalize into GenericTokenCache |
| Custom JVM token provider | Unchanged | Will simplify to reading YAML config |
Open Questions (Part 2)¶
- Command output format: Support both bare string and JSON — try JSON parse, fall back to bare string +
token_ttl. - Interactive commands:
token_commandmay open a browser. Framework should allow TTY passthrough, not capture stderr. - Timeout: Some OAuth flows wait up to 5 minutes. Should be configurable per provider.
- Security: Token cache files should have restricted permissions (600).
- Auto-refresh scheduling: Some flows check every 2 minutes. Generic version should support configurable interval.
Related Documents¶
- 030: LLM Provider Configuration — YAML schema for providers
- 036: Workspace Config Resolution — Where config files live
- Current implementation:
LlmAuthResolver.kt