Verifying that a ticket was properly implemented has always been a manual step. Someone checks the staging environment against the requirements and signs off. It is repetitive, time-consuming, and easy to skip under deadline pressure.
AI agents can now do this. They can read a ticket, open a browser, navigate to the right screen, and produce a structured verification report. Some can go further and read the source code directly, cross-referencing what the implementation actually does against what it appears to do in the UI.
The same approach can be applied earlier in the process: running an agent during development as a personal tester before a ticket is even submitted for review.
But which agent to use? And at what level of access? These are not obvious questions. An agent that finishes in 100 seconds and one that takes 10 minutes do not produce the same report, and the difference is not just speed.
This article documents a series of test runs against the same ticket and the same live system: three agents compared side by side, then a closer look at the browser tool layer underneath.
1. Test Case Data (Ground Truth)
To ensure objectivity, all three methods were tested against the same ticket: Odoo 18.0: web_gantt customizations. The specific requirements included:
- Resizable Left Section: The Gantt row title column must be resizable by the user.
- Text Overflow Handling: Implementation of a Gantt-specific CSS class using
text-overflow: ellipsisfor the title column andoverflow: visiblefor Gantt task bars (pills). - Cell Height Optimization: Reducing Gantt cell height and ensuring task pills perfectly align with the cell height.
- Year Scale Customization: Updating headers to show “Month + Year” on the top line and “Week Number” on the bottom line.
2. Comparison Table of Execution Metrics
The following table summarizes the technical specifications and execution results observed during the test:
| Metric | Claude Code + Playwright | Claude Code Extension | Antigravity |
|---|---|---|---|
| ModelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… Used | Claude Sonnet 4.6 | Claude Sonnet 4.6 | Gemini 2.5 Pro (Low) |
| Access Rights | Codebase + Browser | Direct Browser Access | Browser only (codebase access not configured) |
| Execution Time | 337 seconds | 645 seconds | 100 seconds |
| Final Result | PASS WITH RISK | DONE | DONE |
3. Detailed Method Analysis
Claude Code +
Playwright
This method gives the agent access to the source code of the module being tested, in addition to the browser.
- Depth of Analysis: Because it can inspect the code, the agent identified logic risks not visible in the UI. It flagged that week numbers (
W01,W02, etc.) are calculated relative to the display period start, not as ISO week-of-year values. - Technical Insights: The report referenced specific JavaScript patches such as
getRowTypeHeight, and noted that the 24px compact height only activates in grouped views, a behavior invisible from the browser alone.
View promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. used
You are an AI QA engineer.
You have access to: Read code in module custom_web_gantt · Browser via Playwright MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a… · Chrome DevTools MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a…
INPUT
URL: {{url}} · session_id: {{session_id}}
Ticket: odoo 18.0: web_gantt customizations
TICKET CONTENT
1. Resizable left section: Gantt line titles column
2. Text overflow: new Gantt-specific CSS class: text-overflow:ellipsis for title column, overflow:visible for pills
3. Cell height: reduce height, align pill height to cell height
4. Year scale: top header: month+year, bottom header: week number
TASK
1. Understand expected behavior from ticket
2. Explore codebase: relevant files, changes, impact
3. Identify risks: UI, interaction, side effects
4. Generate test plan including edge cases
5. Execute via Playwright MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a…: navigate, interact, capture screenshots
6. Validate: DOM structure, visual issues, interaction behavior, network calls
7. Estimate reasoning steps and total duration
8. Output structured JSON report with machine-readable data and human-readable summary
View full QA report
| # | FeatureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. | Status |
|---|---|---|
| 1 | Resizable left section | Pass |
| 2 | Text overflow (custom_gantt_ellipsis) | Pass |
| 3 | Cell height reduction | Pass with risk |
| 4 | Year scale (month+year / week number) | Pass with risk |
FeatureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. 1: Resizable Left Section PASS
2 drag handles injected: one in the sticky header (.o_gantt_title), one in the sidebar (.o_gantt_row_sidebar). Resize confirmed: dragging 80px right increased column width from 168px to 248px. The --Gantt__RowHeader-width CSS variable updates in real-time. Width persisted in _ndaCustomRowWidth and survives re-renders.
FeatureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. 2: Text Overflow (custom_gantt_ellipsis) PASS
custom_gantt_ellipsis class present on .o_gantt_view. Pill title moved outside .o_gantt_pill into .o_gantt_pill_wrapper (XML override applied for both web_gantt and hr_holidays_gantt). Pill title: overflow: visible, white-space: nowrap. Row title: text-overflow: ellipsis, overflow: hidden. Measured pill width ~60px vs title widths 205–268px, overflow confirmed.
FeatureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. 3: Cell Height Reduction PASS WITH RISK
getRowTypeHeight patched: t0=24px, t1=36px, t2=0px. Pill wrapper padding reduced to 0 0 1px 0. In views without groupBy, all rows are type t1 (36px); no t0 (24px) rows exist in flat layouts. Actual measured row height: 36px.
Action required: Confirm with developer whether 24px height is intentionally scoped to sub-rows in grouped views only.
FeatureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. 4: Year Scale Customization PASS WITH RISK
Top header: month + year format confirmed (January 2026, February 2026...). Bottom header: week numbers confirmed (W01, W02...). Current week highlighted. Week numbers are calculated relative to the display period start, not ISO week-of-year. When viewing March–May 2026, W01 = week starting Feb 23, 2026 (not ISO week 9).
Action required: Confirm week numbering requirement with BA.
Issues
Medium
- [M1] Without
groupBy, all rows aret1= 36px. The 24px compact height only activates in grouped sub-rows. - [M2]
W01resets relative to the display period, not a fixed ISO week 1.
Low
- [L1] Adjacent pills in year scale may have overlapping overflow titles (~60px columns, 205–268px titles).
- [L2] Resize handle hitbox is 8px wide, which may be hard to target on trackpads.
Chromium (Playwright MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a…) · 1280×720 · Week & Year scales · module v18.0.1.0.0
Claude Code Extension
This method operates directly within the browser, working from the rendered DOM and CSS without access to the source code.
- Depth of Analysis: The agent confirmed the presence of the
custom_gantt_ellipsisCSS class and its specific rules. It measured a pill height of 35px against a cell height of 36px to verify alignment. - Performance: This was the slowest method at over 10 minutes, the trade-off for its level of DOM and CSS detail.
View promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. used
You are an AI QA Engineer. Verify whether a ticket has been correctly implemented.
Sources: INPUT section (source of truth) · current browser page (verify actual behavior)
TICKET CONTENT
1. Resizable left section: Gantt line titles column
2. Text overflow: new Gantt-specific CSS class: text-overflow:ellipsis for title column, overflow:visible for pills
3. Cell height: reduce height, align pill height to cell height
4. Year scale: top header: month+year, bottom header: week number
INSTRUCTIONS
1. Understand the ticket: identify all expected changes, break into testable points
2. Verify on UI: interact with the page, compare actual vs expected behavior
3. Evaluate completeness: check for anything missing, incorrect, or not testable
OUTPUT FORMAT
Overall Result: DONE / PARTIALLY DONE / NOT DONE
Ticket Summary · Verification Results per requirement · Issues / Missing parts · Notes
View full QA report
Requirement 1: Resizable left section DONE
A resize handle element (o_gantt_column_resize_handle) is present inside the title column with cursor: col-resize, positioned at the right edge. A colored vertical bar appears on hover. The left column width is dynamically adjustable.
Requirement 2: Text overflow (custom_gantt_ellipsis) DONE
CSS class custom_gantt_ellipsis created and applied on the Gantt view element. Two rules: title column uses text-overflow: ellipsis (titles truncate as "Mitchell Ad..."); pills use overflow: visible (text remains fully readable). Class confirmed present on the live rendered element.
Requirement 3: Cell height reduced / pill height aligned DONE
Pill height: 35px. Cell height: 36px. Pills fill the cell height perfectly (1px difference accounted for by padding). Visibly more compact than the standard Odoo Gantt default (~40px).
Requirement 4: Year scale customization DONE
Top header: "January 2026", "February 2026", "March 2026" (month + year confirmed). Bottom header: "W01", "W02", ... "W18" (week numbers confirmed). Note: the "2025" labelThe ground-truth output or target value associated with a training example in supervised learning. Labels are what the model is trained to predict (e.g., spam/not-spam, price, sentiment). in the top header is not a bug; it represents the tail-end of 2025 in the date range.
Issues / Missing Parts: None identified.
Antigravity
Powered by Gemini 2.5 Pro (Low), Antigravity is a full agentic development platform with access to the editor, terminal, and browser. In this test, it was configured for browser-only access, without codebase access.
- Depth of Analysis: The agent validated requirements from the user’s perspective, verifying the resizable column, the “Month + Year” header format, and the visual fit of the colored pills within the grid.
- Performance: Completed in 100 seconds, the shortest execution time of the three.
View promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query. used
You are an AI QA Engineer. Verify whether a ticket has been correctly implemented.
Sources: INPUT section (source of truth) · current browser page (verify actual behavior)
TICKET CONTENT
1. Resizable left section: Gantt line titles column
2. Text overflow: new Gantt-specific CSS class: text-overflow:ellipsis for title column, overflow:visible for pills
3. Cell height: reduce height, align pill height to cell height
4. Year scale: top header: month+year, bottom header: week number
INSTRUCTIONS
1. Understand the ticket: identify all expected changes, break into testable points
2. Verify on UI: interact with the page, compare actual vs expected behavior
3. Evaluate completeness: check for anything missing, incorrect, or not testable
OUTPUT FORMAT
Overall Result: DONE / PARTIALLY DONE / NOT DONE
Ticket Summary · Verification Results per requirement · Issues / Missing parts · Notes
View full QA report
Requirement 1: Resizable left section DONE
The left column (e.g., employee names like "Marc Demo") is fully resizable. A user can drag its right boundary to make the column wider or narrower as needed.
Requirement 2: Text overflow DONE
Long titles truncate correctly with an ellipsis when the column is narrow. Pill text (e.g., "Marc Demo Sick Time Off: 3 days") overflows visibly across adjacent grid cells, not cut off or hidden, remaining fully readable even when the pill duration is short.
Requirement 3: Cell height DONE
Cell heights are visibly reduced and more compact. Colored pills stretch to perfectly match the full cell height, leaving no extra top or bottom margins.
Requirement 4: Year scale customization DONE
Top header displays month and year together (e.g., "March 2026", "April 2026"). Bottom sub-header breaks down into week numbers ("W01", "W02", "W03") corresponding to the months above.
Issues / Missing Parts: None. The visual modifications integrate smoothly.
4. A Sub-Question: Does the Browser Tool Choice Matter?
Method 1 used Claude Code with Playwright MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a… as its browser interface. But Playwright is not the only option. Claude Code can also drive a browser through Chrome DevTools Protocol (Chrome-CDP), which operates via shell commands rather than direct MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a… tool calls.
This raised a concrete question: if you keep the agent and the ticket the same but swap the browser tool, does the QA output change?
We ran the same ticket a second time using Chrome-CDP. The answer was no: all four requirements verified, same findings, no issues. The QA result was identical.
What changed was the cost of getting there.
| Criteria | Chrome-CDP | Playwright MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a… |
|---|---|---|
| Speed per call | Slower (each command spawns a Node.js process via shell) | Faster (direct MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a… tool calls, no shell overhead) |
| Tokens per call | Higher (output wrapped in shell stderr/stdout) | Lower (clean JSON/text returned directly) |
| Total calls needed | ~18 Bash calls | ~10-12 tool calls |
| Screenshots | Save to file, then read file (2 steps) | Image returned directly in response (1 step) |
| JS evaluation | Requires string escaping through shell | Clean JSON parameters, no escaping required |
| Accessibility snapshot | Supported | Supported |
Chrome-CDP has specific situations where it becomes the more practical option: when raw CDP commands are needed that Playwright does not expose, when reusing an already-open browser session with an active authenticated login, or when a persistent daemon is required to maintain state across multiple chat turns.
For a standard UI verification task like this one, the browser tool is an implementation detail. In this run, Chrome-CDP required approximately 18 Bash calls versus 10-12 for Playwright MCPAn open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP allows LLMs to call tools (file systems, APIs, databases) in a…, with 3 additional Read steps just to retrieve screenshots that Playwright returns directly. The estimated difference is 30-40% more calls and higher tokenThe basic unit of text processed by an LLM. A token is roughly 4 characters or 0.75 words in English. LLMs process and generate text as sequences of tokens. Tokenization varies by model and language. usage per call due to shell output wrapping. The verification quality was the same.
5. Observation: What Each Method Can and Cannot See
The browser tool question is one layer down. The more fundamental difference across these three tests is whether the agent was given access to the source code.
The difference in final results (DONE vs. PASS WITH RISK) reflects exactly this. Claude Code + Playwright read the module directly and surfaced logic risks that do not appear in the UI. The other two methods worked from the rendered page only, and both returned DONE because the visual output matched the ticket.
The other two methods were run without codebase access, reflecting a verification workflow where the reviewer works from the UI rather than the source.
The practical implication: an agent operating from the browser alone can verify that the UI looks correct. An agent that also reads the code can flag cases where the implementation does something the UI does not reveal. Browser-only verification fits a functional consultant reviewing whether the featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. works as specified. Code-aware verification is for developers who need to confirm the implementation is sound before the ticket moves forward.