Logging and Reporting Architecture¶

Designing structured logging that works across agent runs, CI, and desktop.

Background¶

AI agents are notoriously difficult to debug. When a test fails, understanding why requires visibility into:

What the agent “saw” (screen state, view hierarchy)
What the LLM was asked and what it responded
Which tools were executed and their results
The sequence of events leading to failure

Without detailed logging, debugging becomes guesswork. Additionally, we need to present this information in ways that are accessible during development (desktop app) and in CI results (web reports).

What we decided¶

Trailblaze implements a structured logging system (TrailblazeLog) that captures detailed agent activity, which powers both the desktop app’s real-time view and generated web reports.

Structured Log Events¶

All agent activity is captured as typed log events that inherit from TrailblazeLog. Each log includes:

Session ID: Groups logs for a single test execution
Timestamp: Precise timing for event ordering

Key log types capture different aspects of agent behavior:

Log Type	Purpose
`TrailblazeSessionStatusChangeLog`	Test lifecycle (started, completed, failed)
`TrailblazeLlmRequestLog`	LLM prompts, responses, tool calls, and cost
`TrailblazeToolLog`	Tool execution results and timing
`MaestroDriverLog`	Low-level device interactions
`MaestroCommandLog`	Maestro command execution details
`ObjectiveStartLog` / `ObjectiveCompleteLog`	Test step progress
`TrailblazeSnapshotLog`	User-initiated screen captures
`TrailblazeAgentTaskStatusChangeLog`	Agent task state transitions

Rich Context Capture¶

Logs capture rich context for debugging:

Screenshots: Screen captures at key moments (LLM requests, tool execution)
View Hierarchies: Full and filtered UI tree for element inspection
LLM Messages: Complete conversation history with the model
Tool Options: Available tools at each decision point
Usage/Cost: Token counts and estimated costs per LLM request
Durations: Timing for each operation

Log Storage¶

Logs are written to disk as JSON files organized by session:

logs/
└── 2026-01-28_14-30-00_LoginTest/
    ├── 001_TrailblazeSessionStatusChangeLog.json
    ├── 002_TrailblazeLlmRequestLog.json
    ├── 002_screenshot.png
    ├── 003_TrailblazeToolLog.json
    ├── 004_MaestroDriverLog.json
    └── ...

This file-based approach enables:

Persistence across restarts
Easy sharing of debug artifacts
Simple archiving in CI systems
Reactive file watching for live updates

Desktop App Integration¶

The desktop app uses LogsRepo to provide a real-time view of test execution:

Live updates: File watchers detect new logs and update the UI immediately
Session list: Browse all test sessions with status indicators
Log timeline: Step through events chronologically
Screenshot viewer: See exactly what the agent saw
View hierarchy inspector: Explore the UI tree at any point
LLM conversation viewer: Review prompts and responses

This makes the desktop app an essential development tool—engineers can watch tests execute in real-time and immediately understand failures.

Web Report Generation¶

The trailblaze-report module generates static HTML/WASM reports from log data:

Log collection: Gather logs from test execution (local or CI)
Report generation: Bundle logs with a WebAssembly-based viewer
Static output: Single-file HTML that can be viewed in any browser

Reports provide the same inspection capabilities as the desktop app but as a shareable artifact. This is critical for CI pipelines where:

Test failures need investigation without access to the original machine
Results must be archived for compliance or historical analysis
Multiple team members need to review the same failure

Why Custom Logging (Not Standard Logging Frameworks)¶

We chose structured TrailblazeLog events over traditional logging (Log4j, SLF4J) because:

Type safety: Sealed class hierarchy ensures all logs have required fields
Rich data: Screenshots and view hierarchies can’t be captured in text logs
Queryable: Logs can be filtered by type, searched, and analyzed programmatically
UI-friendly: Typed events map directly to UI components
Cross-platform: Same log format works on Android, desktop, and web

Traditional logging is still used for framework-level debugging, but TrailblazeLog captures the semantically meaningful agent events.

What changed¶

Positive:

Debugging agent failures becomes tractable with full context
Desktop app provides immediate feedback during development
CI reports enable async investigation of failures
Screenshots and hierarchies make visual debugging possible
Structured format enables tooling (analysis, comparison, search)

Negative:

Log files can become large (especially with screenshots)
Disk I/O overhead during test execution
Custom log viewer required (can’t use standard log tools)
Log format changes require updates to viewers