Handwritten Agent Loop¶
A core architectural choice — why we hand-wrote the agent loop instead of using a framework.
Background¶
AI agents require an execution loop that orchestrates:
- Gathering context (screen state, test instructions, history)
- Calling the LLM for reasoning and tool selection
- Executing selected tools
- Processing results and deciding next steps
- Handling errors and retries
- Recording successful runs
Many agent frameworks exist (LangChain, AutoGen, CrewAI, etc.) that provide abstractions for this loop. We needed to decide whether to adopt an existing framework or implement our own.
What we decided¶
Trailblaze uses a handwritten while loop for its core agent execution.
Implementation Overview¶
The agent loop is a straightforward while loop that continues until the test completes (success or failure) or a termination condition is met:
// Simplified conceptual representation
suspend fun runAgent(objective: Objective): TestResult {
val completedTools = mutableListOf<CompletedTool>()
while (completedTools.size < MAX_ITERATIONS) {
// 1. Capture current screen state
val screenshot = captureScreenshot()
val viewHierarchy = captureViewHierarchy()
// 2. Build fresh LLM request with current context
val request = buildRequest(
systemPrompt = SYSTEM_PROMPT,
objective = objective,
completedTools = completedTools,
screenshot = screenshot,
viewHierarchy = viewHierarchy
)
// 3. Call LLM for next action
val response = llmClient.chat(request)
// 4. Execute tool calls sequentially
for (toolCall in response.toolCalls) {
val result = driver.executeTool(toolCall)
completedTools.add(CompletedTool(toolCall, result))
// Check for terminal conditions
if (result.isObjectiveComplete || result.isFailure) {
return result.toTestResult()
}
}
}
return TestResult.Timeout("Exceeded $MAX_ITERATIONS iterations")
}
Why Handwritten¶
1. Simplicity and Transparency¶
A while loop is easy to understand, debug, and modify. New team members can read the code and understand exactly what the agent does. There’s no framework abstraction layer to learn or work around.
2. Control Over Execution¶
We have precise control over:
- When and how the LLM is called
- How context is constructed for each request
- Tool execution ordering and sequencing
- What gets recorded and when
- Termination conditions and limits
3. Mobile-Specific Requirements¶
Trailblaze has unique requirements that existing agent frameworks don’t address:
- On-device execution with resource constraints
- Integration with platform drivers for device interactions
- Trail recording format (see Trail Recording Format)
- Trail mode that replays recorded tool sequences without LLM calls
4. Avoiding Dependency Risk¶
Agent frameworks are evolving rapidly. Depending on an external framework means:
- Tracking breaking changes in a fast-moving ecosystem
- Working around framework limitations
- Framework bugs becoming our bugs
- Potential abandonment or direction changes
Loop Termination¶
The loop terminates under the following conditions:
- Objective completion: The agent calls the
objectiveStatustool with aCOMPLETEDorFAILEDstatus, indicating the test objective has been achieved or cannot be completed - Assertion failure: An assertion tool (e.g.,
assertVisibleWithText) fails, indicating an unexpected state - Element not found: A required UI element cannot be located after the agent’s attempts
- Iteration limit: A maximum of 50 LLM calls per step prevents runaway execution
Future improvements may include more sophisticated loop detection to identify when the agent is stuck repeating ineffective actions.
Tool Execution¶
Tools execute sequentially, one at a time. Parallel tool execution is not supported because Trailblaze interacts with a UI—only one interaction can happen at a time on a device.
Tools execute once without automatic retries at the loop level. If a tool needs retry logic, it must implement that internally. When a tool completes (successfully or not), the agent proceeds based on the result:
- For terminal results (assertions, objective status), the loop may end
- For non-terminal results, the agent continues and relies on subsequent steps to detect any issues
Tool calls delegate to platform drivers (Android or iOS) to perform actual device interactions. The Trailblaze tools provide a high-level abstraction, while drivers handle the device-specific implementation details.
Context Window Management¶
Rather than maintaining a growing conversation history, Trailblaze constructs each LLM request fresh. On every iteration, the agent sends:
- System prompt with instructions
- Current objective
- List of previously completed tools (providing execution history)
- Latest screenshot
- Current view hierarchy
This “subagent” pattern keeps the context window manageable—typically under 10,000 input tokens—well within LLM limits. By always including the latest screen state and omitting stale information, we reduce LLM confusion and improve decision quality.
Running Trails (Replay Mode)¶
When a test has a recorded trail, it can run in trail mode which bypasses the LLM entirely. The recorded tool sequence from the .trail.yaml file executes deterministically. See Trail Recording Format for details on how trails are structured and when they’re used.
What the Loop Handles¶
- Context construction: Building fresh LLM requests with current screen state, objectives, and execution history
- LLM communication: Calling the LLM, parsing responses, extracting tool calls
- Tool execution: Invoking tools sequentially, delegating to platform drivers
- Recording: Capturing successful tool sequences for trail replay
- Termination: Recognizing completion, failure, and limit conditions
What changed¶
Positive:
- Complete control over agent behavior
- Easy to understand, debug, and modify
- No external framework dependencies to manage
- Can optimize for mobile and on-device constraints
- Straightforward to add new capabilities
Negative:
- Must implement features that frameworks provide out-of-the-box
- No automatic benefit from framework improvements
- Requires more upfront implementation work
- Team must maintain all agent logic internally