Architecture¶
This document describes the software architecture and key design decisions of the Trailblaze framework.
Technology Stack¶
| Technology | Purpose |
|---|---|
| Kotlin | Primary language for all modules (Kotlin Multiplatform where applicable) |
| Gradle (Kotlin DSL) | Build system with version catalogs (libs.versions.toml) |
| Kotlin Serialization | JSON serialization for tools, logs, and API communication |
| Kotlin Coroutines | Async execution for tool calls and LLM interactions |
| Compose Multiplatform | Desktop application UI |
| Compose Material 3 | UI component library and theming |
| Ktor | HTTP client for LLM API calls, server for log aggregation |
Why Kotlin?¶
- Multiplatform support - Share code between Android on-device and JVM host modes
- Coroutines - Clean async APIs for LLM calls and tool execution
- Serialization - First-class support for JSON serialization with annotations
- Type safety - Catch errors at compile time, especially important for tool definitions
- Android ecosystem - Native integration with Android instrumentation testing
Kotlin Multiplatform Strategy¶
We aim to write as much code as possible in commonMain source sets for maximum reusability:
src/
├── commonMain/ # Preferred - platform-agnostic code
├── jvmMain/ # JVM-specific (desktop, host mode)
└── androidMain/ # Android-specific (on-device mode)
Why commonMain? Code in commonMain can be shared across all targets, reducing duplication and ensuring
consistent behavior.
JVM Constraints: Some code must be JVM-specific due to dependencies:
- Maestro - The device interaction library is JVM-only, so driver implementations live in
jvmMainorandroidMain - File I/O - Some file operations use JVM-specific APIs
- Desktop UI - Compose Desktop targets JVM
When adding new code, prefer commonMain unless there’s a specific platform requirement.
UI with Compose Material 3¶
The Desktop application uses Compose Multiplatform with Material 3:
- Material 3 Components - Buttons, cards, dialogs, navigation, etc.
- Dynamic Theming - Support for light/dark modes
- Material Icons Extended - Comprehensive icon library
- Multiplatform Markdown Renderer - For displaying test documentation
UI code lives primarily in trailblaze-ui (shared components) and trailblaze-desktop (application shell).
Build Structure¶
The project uses Gradle with Kotlin DSL and a multi-module structure:
opensource/
├── build.gradle.kts # Root build configuration
├── gradle/libs.versions.toml # Centralized dependency versions
├── trailblaze-agent/ # Core agent logic
├── trailblaze-android/ # Android on-device driver
├── trailblaze-common/ # Shared utilities and tools
├── trailblaze-desktop/ # Desktop application
├── trailblaze-host/ # Host-mode driver
├── trailblaze-models/ # Data models (Kotlin Multiplatform)
├── trailblaze-report/ # Reporting utilities
├── trailblaze-server/ # Log server
└── trailblaze-ui/ # Shared UI components
Overview¶
Trailblaze is a layered architecture that separates AI reasoning from platform-specific execution:
flowchart TB
subgraph agent["Agent Layer"]
LLM["LLM Provider"]
SP["System Prompt"]
TC["Tool Calling"]
end
subgraph syntax["Test Syntax Layer"]
YAML[".trail.yaml Parser"]
PROMPT["Prompt Steps"]
ACTION["Action Steps"]
end
subgraph tools["Tool Layer"]
ET["Executable Tools"]
DT["Delegating Tools"]
CT["Custom Tools"]
end
subgraph drivers["Platform Driver Layer"]
AD["Android Driver"]
ID["iOS Driver"]
end
agent --> syntax
syntax --> tools
tools --> drivers
Design Principles¶
1. Platform Agnostic Agent¶
The Trailblaze agent is designed to be platform-agnostic. It reasons about UI interactions using natural language and a standardized tool interface, without knowledge of the underlying platform implementation.
Why? This allows the same test logic to work across different platforms and enables future extensibility.
2. Tool-Based Execution¶
All device interactions are expressed as tools that the agent can invoke. Tools provide a clean abstraction between the agent’s intent and the platform-specific implementation.
Why? Tools are:
- Composable and reusable
- Easy to extend with custom implementations
- Testable in isolation
- Self-documenting via annotations
3. Deterministic + AI Hybrid¶
Tests can combine deterministic actions (always execute the same way) with AI-driven steps (adapt to screen state). This provides both reliability and flexibility.
Why? Pure AI tests can be unpredictable; pure scripted tests are brittle. The hybrid approach gives test authors control over the trade-off.
Key Components¶
MaestroTrailblazeAgent¶
The abstract base class for all Trailblaze agents. It provides:
- Tool execution framework
- Memory for storing data between steps
- Logging integration
- Support for both blocking and suspending execution
abstract class MaestroTrailblazeAgent : TrailblazeAgent {
abstract suspend fun executeMaestroCommands(commands: List<Command>, traceId: TraceId?)
val memory = AgentMemory()
fun runTrailblazeTools(tools: List<TrailblazeTool>): RunTrailblazeToolsResult
}
Implementations:
AndroidMaestroTrailblazeAgent- On-device Android executionHostMaestroTrailblazeAgent- Host-mode execution for Android/iOS
Tool System¶
Tools are the primary abstraction for device interactions. There are two main types:
ExecutableTrailblazeTool¶
Tools that directly perform actions:
interface ExecutableTrailblazeTool : TrailblazeTool {
suspend fun execute(toolExecutionContext: TrailblazeToolExecutionContext): TrailblazeToolResult
}
Examples: tapOnElementWithText, inputText, assertVisible
DelegatingTrailblazeTool¶
Tools that expand into multiple executable tools:
interface DelegatingTrailblazeTool : TrailblazeTool {
fun toExecutableTrailblazeTools(context: TrailblazeToolExecutionContext): List<ExecutableTrailblazeTool>
}
Why delegating tools? They allow complex, multi-step operations to be exposed as single tool calls to the agent, reducing token usage and improving reliability.
Custom Tools¶
Applications can define custom tools for app-specific interactions:
@Serializable
@TrailblazeToolClass("signInWithEmailAndPassword")
@LLMDescription("Sign in using email and password")
data class SignInTool(val email: String, val password: String) : MapsToMaestroCommands {
override fun toMaestroCommands(): List<Command> = listOf(/* commands */)
}
Why? Custom tools let teams encode domain knowledge and reduce test verbosity.
Extensibility¶
Trailblaze is designed to be extended without modifying the core framework. There are several extension points:
Custom Tools¶
The primary extension mechanism. Define tools as Kotlin data classes with annotations:
@Serializable
@TrailblazeToolClass("myCustomAction")
@LLMDescription("Description shown to the LLM")
data class MyCustomTool(
val param1: String,
val param2: Int
) : ExecutableTrailblazeTool {
override suspend fun execute(context: TrailblazeToolExecutionContext): TrailblazeToolResult {
// Custom implementation
return TrailblazeToolResult.Success
}
}
Tools are automatically:
- Serialized to JSON for LLM function calling
- Documented via
@LLMDescriptionannotations - Validated at compile time via Kotlin’s type system
Test Rules¶
Create reusable test base classes that configure the agent:
class MyAppTrailblazeTest(
testTimeoutMinutes: Int = 10
) : TrailblazeTestRule(
customTools = listOf(MyCustomTool::class),
// ... other configuration
)
Platform Drivers¶
Implement MaestroTrailblazeAgent to add support for new platforms:
class WebTrailblazeAgent : MaestroTrailblazeAgent() {
override suspend fun executeMaestroCommands(
commands: List<Command>,
traceId: TraceId?
): TrailblazeToolResult {
// Translate commands to web automation
}
}
LLM Providers¶
The agent communicates with LLMs through an abstraction that can be swapped:
- Default: OpenAI-compatible API
- Extensible to other providers (Anthropic, local models, etc.)
Test Syntax (.trail.yaml)¶
Tests are authored in .trail.yaml format, which supports:
Prompt Steps (AI-driven)¶
- prompts:
- step: Navigate to the Settings screen and enable notifications
Action Steps (Deterministic)¶
- maestro:
- tapOn: "Settings"
- assertVisible: "Notifications"
Recording Hints¶
- prompts:
- step: Tap the login button
recording:
tools:
- tapOnElementWithText:
text: "Login"
Recording hints guide the agent toward specific tool calls while still allowing adaptation.
Execution Modes¶
On-Device Mode (Android)¶
flowchart LR
subgraph device["Android Device"]
TEST["Test Process"]
AGENT["Trailblaze Agent"]
DRIVER["Android Driver"]
APP["App Under Test"]
end
LLM["LLM API"]
TEST --> AGENT
AGENT <--> LLM
AGENT --> DRIVER
DRIVER --> APP
The agent runs inside the test process on the device. This enables:
- Integration with cloud device farms (Firebase Test Lab, AWS Device Farm)
- No host machine required
- Parallel execution at scale
Host Mode¶
flowchart LR
subgraph host["Host Machine"]
DESKTOP["Desktop App"]
AGENT["Trailblaze Agent"]
end
subgraph device["Device"]
DRIVER["Platform Driver"]
APP["App Under Test"]
end
LLM["LLM API"]
DESKTOP --> AGENT
AGENT <--> LLM
AGENT --> DRIVER
DRIVER --> APP
The agent runs on a host machine and controls devices remotely. This enables:
- Cross-platform testing (Android + iOS)
- Interactive test authoring via Desktop app
- Local development workflow
Platform Drivers¶
Platform drivers translate tool calls into device-specific actions.
Current Implementation¶
For mobile platforms, Trailblaze uses Maestro internally as the device interaction layer. However:
- This is an implementation detail - users interact with Trailblaze tools, not Maestro directly
- Not all Maestro features are exposed - only the subset needed for Trailblaze tools
- The driver is replaceable - the architecture supports alternative implementations
Driver Interface¶
Drivers must implement command execution:
abstract suspend fun executeMaestroCommands(
commands: List<Command>,
traceId: TraceId?
): TrailblazeToolResult
Logging & Reporting¶
All tool executions are logged with:
- Tool name and parameters
- Execution duration
- Success/failure status
- Trace IDs for correlation
- Session information
Logs are sent to the Trailblaze server for aggregation and reporting.
Memory System¶
The AgentMemory class allows tests to store and retrieve data:
// Store a value
memory.set("username", "test@example.com")
// Retrieve in a later step
val username = memory.get("username")
Why? Memory enables multi-step tests where later steps depend on data from earlier steps (e.g., storing a generated ID).
Code Conventions¶
Kotlin Style¶
- Follow Kotlin coding conventions
- Use coroutines for async operations (avoid callbacks)
- Prefer
valovervar(immutability) - Use data classes for DTOs and tool definitions
- Use sealed classes for restricted hierarchies (e.g.,
TrailblazeToolResult)
Tool Definitions¶
- Always annotate with
@Serializable,@TrailblazeToolClass, and@LLMDescription - Keep tool parameters simple (primitives, strings, enums)
- Write clear
@LLMDescriptiontext - the LLM uses this to decide when to call the tool - Return
TrailblazeToolResult.SuccessorTrailblazeToolResult.Errorwith a helpful message
Testing¶
- Use JUnit 4 for tests (
org.junit.Test) - Create fake/mock agents for unit testing tools
- Integration tests should use real devices or emulators
Logging¶
- Use
TrailblazeLoggerfor structured logging - Include
traceIdfor correlation across tool calls - Log tool execution start, completion, and errors
Future Considerations¶
The architecture is designed to support:
- Additional platform drivers - Web, desktop, or custom UI frameworks
- Multiple LLM providers - Currently supports OpenAI, extensible to others
- Enhanced recording - Capture and replay interactions with higher fidelity
- Distributed execution - Coordinate agents across multiple devices