Skip to content

Playwright-Native Benchmarks

Comparison of AI mode (LLM interprets natural language steps) vs Recording mode (pre-recorded tool calls replayed deterministically) for the Playwright-native test trails.

Latest Results (2026-02-25)

Trail AI (sec) Recording (sec) Speedup
test-counter 48.7 0.4 135.1x
test-form-interaction 41.2 0.3 139.6x
test-navigation 54.6 0.3 165.6x
test-all-tools 111.6 0.0 13954.0x
test-scroll-containers 47.6 0.0 7929.3x
test-duplicate-list 112.5 0.6 200.2x
test-search-duplicates 164.9 0.8 217.0x

Trail Descriptions

  • test-counter - Navigate to counter page, increment three times, verify value is 3, decrement once, verify value is 2, reset, verify value is 0
  • test-form-interaction - Fill out a contact form (name, email, category dropdown, message textarea), submit, verify success message and submitted data
  • test-navigation - Navigate between Home, Form, Counter, and About pages via links, verify each page heading/content is correct
  • test-all-tools - Exercises every Playwright-native tool: navigate, snapshot, verify (text/element/value/list), click, type, hover, scroll (page + container), select option, press key, wait, browser back/forward
  • test-scroll-containers - Scroll within independent sidebar and content panel containers, verify initially-hidden items (Category 15, Item 20) become visible after scrolling
  • test-duplicate-list - Click specific View buttons in a list where multiple items share the same “Premium Cable” or “Standard Adapter” text across Electronics, Office Supplies, and Accessories sections, verifying each click selects the correct item by its unique ID
  • test-search-duplicates - Search for products with duplicate names (Wireless Mouse, Keyboard, Monitor Stand), then click each individual result distinguished by subtitle/variant, verifying the correct item detail is shown

How to Run

bash trails/playwright-native/benchmark.sh

Results are appended to playwright-native-benchmarks.csv for tracking over time.

Notes

  • AI mode timings include LLM inference latency and are expected to vary between runs.
  • Recording mode replays pre-recorded tool calls without LLM inference, so timings reflect pure Playwright execution speed.
  • The speedup ratio shows how much faster recording mode is compared to AI mode. Higher speedup on simpler trails (e.g., test-counter) reflects the fixed overhead of LLM calls dominating short tests.