Agentic AI for QA: 3 different approaches

AI agents can now replace the manual post-deploy verification step. But which agent, and at what level of access? We ran three against the same Odoo 18 ticket to find out.

Verifying that a ticket was properly implemented has always been a manual step. Someone checks the staging environment against the requirements and signs off. It is repetitive, time-consuming, and easy to skip under deadline pressure.

AI agents can now do this. They can read a ticket, open a browser, navigate to the right screen, and produce a structured verification report. Some can go further and read the source code directly, cross-referencing what the implementation actually does against what it appears to do in the UI.

The same approach can be applied earlier in the process: running an agent during development as a personal tester before a ticket is even submitted for review.

But which agent to use? And at what level of access? These are not obvious questions. An agent that finishes in 100 seconds and one that takes 10 minutes do not produce the same report, and the difference is not just speed.

This article documents a series of test runs against the same ticket and the same live system: three agents compared side by side, then a closer look at the browser tool layer underneath.

1. Test Case Data (Ground Truth)

To ensure objectivity, all three methods were tested against the same ticket: Odoo 18.0: web_gantt customizations. The specific requirements included:

Resizable Left Section: The Gantt row title column must be resizable by the user.
Text Overflow Handling: Implementation of a Gantt-specific CSS class using text-overflow: ellipsis for the title column and overflow: visible for Gantt task bars (pills).
Cell Height Optimization: Reducing Gantt cell height and ensuring task pills perfectly align with the cell height.
Year Scale Customization: Updating headers to show “Month + Year” on the top line and “Week Number” on the bottom line.

Odoo 18 Gantt view with customizations applied — The live Odoo 18 Gantt view used as the test target across all three agents.

2. Comparison Table of Execution Metrics

The following table summarizes the technical specifications and execution results observed during the test:

Metric	Claude Code + Playwright	Claude Code Extension	Antigravity
Model Used	Claude Sonnet 4.6	Claude Sonnet 4.6	Gemini 2.5 Pro (Low)
Access Rights	Codebase + Browser	Direct Browser Access	Browser only (codebase access not configured)
Execution Time	337 seconds	645 seconds	100 seconds
Final Result	PASS WITH RISK	DONE	DONE

3. Detailed Method Analysis

Claude Code + Playwright

This method gives the agent access to the source code of the module being tested, in addition to the browser.

Depth of Analysis: Because it can inspect the code, the agent identified logic risks not visible in the UI. It flagged that week numbers (W01, W02, etc.) are calculated relative to the display period start, not as ISO week-of-year values.
Technical Insights: The report referenced specific JavaScript patches such as getRowTypeHeight, and noted that the 24px compact height only activates in grouped views, a behavior invisible from the browser alone.

View used

You are an AI QA engineer.

You have access to: Read code in module custom_web_gantt · Browser via Playwright MCP · Chrome DevTools MCP

INPUT

URL: {{url}} · session_id: {{session_id}}

Ticket: odoo 18.0: web_gantt customizations

TICKET CONTENT

1. Resizable left section: Gantt line titles column

2. Text overflow: new Gantt-specific CSS class: text-overflow:ellipsis for title column, overflow:visible for pills

3. Cell height: reduce height, align pill height to cell height

4. Year scale: top header: month+year, bottom header: week number

TASK

1. Understand expected behavior from ticket

2. Explore codebase: relevant files, changes, impact

3. Identify risks: UI, interaction, side effects

4. Generate test plan including edge cases

5. Execute via Playwright MCP: navigate, interact, capture screenshots

6. Validate: DOM structure, visual issues, interaction behavior, network calls

7. Estimate reasoning steps and total duration

8. Output structured JSON report with machine-readable data and human-readable summary

View full QA report

Module: custom_web_gantt (Odoo 18.0) Date: 2026-03-31 Result: PASS WITH RISK

#	Feature	Status
1	Resizable left section	Pass
2	Text overflow (custom_gantt_ellipsis)	Pass
3	Cell height reduction	Pass with risk
4	Year scale (month+year / week number)	Pass with risk

Feature 1: Resizable Left Section PASS

2 drag handles injected: one in the sticky header (.o_gantt_title), one in the sidebar (.o_gantt_row_sidebar). Resize confirmed: dragging 80px right increased column width from 168px to 248px. The --Gantt__RowHeader-width CSS variable updates in real-time. Width persisted in _ndaCustomRowWidth and survives re-renders.

Feature 2: Text Overflow (custom_gantt_ellipsis) PASS

custom_gantt_ellipsis class present on .o_gantt_view. Pill title moved outside .o_gantt_pill into .o_gantt_pill_wrapper (XML override applied for both web_gantt and hr_holidays_gantt). Pill title: overflow: visible, white-space: nowrap. Row title: text-overflow: ellipsis, overflow: hidden. Measured pill width ~60px vs title widths 205–268px, overflow confirmed.

Feature 3: Cell Height Reduction PASS WITH RISK

getRowTypeHeight patched: t0=24px, t1=36px, t2=0px. Pill wrapper padding reduced to 0 0 1px 0. In views without groupBy, all rows are type t1 (36px); no t0 (24px) rows exist in flat layouts. Actual measured row height: 36px.

Action required: Confirm with developer whether 24px height is intentionally scoped to sub-rows in grouped views only.

Feature 4: Year Scale Customization PASS WITH RISK

Top header: month + year format confirmed (January 2026, February 2026...). Bottom header: week numbers confirmed (W01, W02...). Current week highlighted. Week numbers are calculated relative to the display period start, not ISO week-of-year. When viewing March–May 2026, W01 = week starting Feb 23, 2026 (not ISO week 9).

Action required: Confirm week numbering requirement with BA.

Issues

Medium

[M1] Without groupBy, all rows are t1 = 36px. The 24px compact height only activates in grouped sub-rows.
[M2] W01 resets relative to the display period, not a fixed ISO week 1.

Low

[L1] Adjacent pills in year scale may have overlapping overflow titles (~60px columns, 205–268px titles).
[L2] Resize handle hitbox is 8px wide, which may be hard to target on trackpads.

Chromium (Playwright MCP) · 1280×720 · Week & Year scales · module v18.0.1.0.0

Claude Code Extension

This method operates directly within the browser, working from the rendered DOM and CSS without access to the source code.

Depth of Analysis: The agent confirmed the presence of the custom_gantt_ellipsis CSS class and its specific rules. It measured a pill height of 35px against a cell height of 36px to verify alignment.
Performance: This was the slowest method at over 10 minutes, the trade-off for its level of DOM and CSS detail.

View used

You are an AI QA Engineer. Verify whether a ticket has been correctly implemented.

Sources: INPUT section (source of truth) · current browser page (verify actual behavior)

TICKET CONTENT

1. Resizable left section: Gantt line titles column

2. Text overflow: new Gantt-specific CSS class: text-overflow:ellipsis for title column, overflow:visible for pills

3. Cell height: reduce height, align pill height to cell height

4. Year scale: top header: month+year, bottom header: week number

INSTRUCTIONS

1. Understand the ticket: identify all expected changes, break into testable points

2. Verify on UI: interact with the page, compare actual vs expected behavior

3. Evaluate completeness: check for anything missing, incorrect, or not testable

OUTPUT FORMAT

Overall Result: DONE / PARTIALLY DONE / NOT DONE

Ticket Summary · Verification Results per requirement · Issues / Missing parts · Notes

View full QA report

Ticket: Odoo 18.0 Gantt Web Customizations Result: DONE

Requirement 1: Resizable left section DONE

A resize handle element (o_gantt_column_resize_handle) is present inside the title column with cursor: col-resize, positioned at the right edge. A colored vertical bar appears on hover. The left column width is dynamically adjustable.

Requirement 2: Text overflow (custom_gantt_ellipsis) DONE

CSS class custom_gantt_ellipsis created and applied on the Gantt view element. Two rules: title column uses text-overflow: ellipsis (titles truncate as "Mitchell Ad..."); pills use overflow: visible (text remains fully readable). Class confirmed present on the live rendered element.

Requirement 3: Cell height reduced / pill height aligned DONE

Pill height: 35px. Cell height: 36px. Pills fill the cell height perfectly (1px difference accounted for by padding). Visibly more compact than the standard Odoo Gantt default (~40px).

Requirement 4: Year scale customization DONE

Top header: "January 2026", "February 2026", "March 2026" (month + year confirmed). Bottom header: "W01", "W02", ... "W18" (week numbers confirmed). Note: the "2025" label in the top header is not a bug; it represents the tail-end of 2025 in the date range.

Issues / Missing Parts: None identified.

Antigravity

Powered by Gemini 2.5 Pro (Low), Antigravity is a full agentic development platform with access to the editor, terminal, and browser. In this test, it was configured for browser-only access, without codebase access.

Depth of Analysis: The agent validated requirements from the user’s perspective, verifying the resizable column, the “Month + Year” header format, and the visual fit of the colored pills within the grid.
Performance: Completed in 100 seconds, the shortest execution time of the three.

View used

You are an AI QA Engineer. Verify whether a ticket has been correctly implemented.

Sources: INPUT section (source of truth) · current browser page (verify actual behavior)

TICKET CONTENT

1. Resizable left section: Gantt line titles column

2. Text overflow: new Gantt-specific CSS class: text-overflow:ellipsis for title column, overflow:visible for pills

3. Cell height: reduce height, align pill height to cell height

4. Year scale: top header: month+year, bottom header: week number

INSTRUCTIONS

1. Understand the ticket: identify all expected changes, break into testable points

2. Verify on UI: interact with the page, compare actual vs expected behavior

3. Evaluate completeness: check for anything missing, incorrect, or not testable

OUTPUT FORMAT

Overall Result: DONE / PARTIALLY DONE / NOT DONE

Ticket Summary · Verification Results per requirement · Issues / Missing parts · Notes

View full QA report

Task: Verifying Gantt Improvements UI Result: DONE

Requirement 1: Resizable left section DONE

The left column (e.g., employee names like "Marc Demo") is fully resizable. A user can drag its right boundary to make the column wider or narrower as needed.

Requirement 2: Text overflow DONE

Long titles truncate correctly with an ellipsis when the column is narrow. Pill text (e.g., "Marc Demo Sick Time Off: 3 days") overflows visibly across adjacent grid cells, not cut off or hidden, remaining fully readable even when the pill duration is short.

Requirement 3: Cell height DONE

Cell heights are visibly reduced and more compact. Colored pills stretch to perfectly match the full cell height, leaving no extra top or bottom margins.

Requirement 4: Year scale customization DONE

Top header displays month and year together (e.g., "March 2026", "April 2026"). Bottom sub-header breaks down into week numbers ("W01", "W02", "W03") corresponding to the months above.

Issues / Missing Parts: None. The visual modifications integrate smoothly.

4. A Sub-Question: Does the Browser Tool Choice Matter?

Method 1 used Claude Code with Playwright MCP as its browser interface. But Playwright is not the only option. Claude Code can also drive a browser through Chrome DevTools Protocol (Chrome-CDP), which operates via shell commands rather than direct MCP tool calls.

This raised a concrete question: if you keep the agent and the ticket the same but swap the browser tool, does the QA output change?

We ran the same ticket a second time using Chrome-CDP. The answer was no: all four requirements verified, same findings, no issues. The QA result was identical.

What changed was the cost of getting there.

Criteria	Chrome-CDP	Playwright MCP
Speed per call	Slower (each command spawns a Node.js process via shell)	Faster (direct MCP tool calls, no shell overhead)
Tokens per call	Higher (output wrapped in shell stderr/stdout)	Lower (clean JSON/text returned directly)
Total calls needed	~18 Bash calls	~10-12 tool calls
Screenshots	Save to file, then read file (2 steps)	Image returned directly in response (1 step)
JS evaluation	Requires string escaping through shell	Clean JSON parameters, no escaping required
Accessibility snapshot	Supported	Supported

Chrome-CDP has specific situations where it becomes the more practical option: when raw CDP commands are needed that Playwright does not expose, when reusing an already-open browser session with an active authenticated login, or when a persistent daemon is required to maintain state across multiple chat turns.

For a standard UI verification task like this one, the browser tool is an implementation detail. In this run, Chrome-CDP required approximately 18 Bash calls versus 10-12 for Playwright MCP, with 3 additional Read steps just to retrieve screenshots that Playwright returns directly. The estimated difference is 30-40% more calls and higher token usage per call due to shell output wrapping. The verification quality was the same.

5. Observation: What Each Method Can and Cannot See

The browser tool question is one layer down. The more fundamental difference across these three tests is whether the agent was given access to the source code.

The difference in final results (DONE vs. PASS WITH RISK) reflects exactly this. Claude Code + Playwright read the module directly and surfaced logic risks that do not appear in the UI. The other two methods worked from the rendered page only, and both returned DONE because the visual output matched the ticket.

The other two methods were run without codebase access, reflecting a verification workflow where the reviewer works from the UI rather than the source.

The practical implication: an agent operating from the browser alone can verify that the UI looks correct. An agent that also reads the code can flag cases where the implementation does something the UI does not reveal. Browser-only verification fits a functional consultant reviewing whether the feature works as specified. Code-aware verification is for developers who need to confirm the implementation is sound before the ticket moves forward.

Agentic AI for QA: 3 different approaches

1. Test Case Data (Ground Truth)

2. Comparison Table of Execution Metrics

3. Detailed Method Analysis

Claude Code + Playwright

Claude Code Extension

Antigravity

4. A Sub-Question: Does the Browser Tool Choice Matter?

5. Observation: What Each Method Can and Cannot See

Ready to put AI to work?