Skip to content

OneTool

OT Image Results

beycom/onetool-mcp

Vision Comparison: img.ask vs Direct Attachment¶

Date: 2026-03-05 Image: tests/data/products-med.png Task: Extract prices from a 4-column product grid, left-to-right, top-to-bottom Method: Subagent-isolated timing (tmr) + session usage delta (cld)

Price Grids¶

Approach A — `img.ask`¶

Col 1	Col 2	Col 3	Col 4
$1,397.00	$1,997.00	$2,197.00	$2,049.00
$797.00	$341.00	$797.00	$517.00
$368.00	$1,397.00	$377.00	$453.00
$229.00	$299.00	$731.00	$470.00
$749.00	$1,018.00	$575.00	$541.00

Approach B — Direct Attachment (Read tool → subagent)¶

Col 1	Col 2	Col 3	Col 4
$727	$341	$197	$249
$368	$1,597	$377	$453
$229	$299	$231	$470

Direct returned only 3 rows vs. 5 rows from img.ask.

Cell-by-cell Accuracy¶

Using img.ask as the reference. Rows 1–2 from Direct don't correspond to any img.ask row. Rows 3–5 partially overlap with Direct rows 1–3.

Row	Col 1	Col 2	Col 3	Col 4
1	✗	✗	✗	✗
2	✗	✗	✗	✗
3	✓	✗	✓	✓
4	✓	✓	✗	✓
5	✗	✗	✗	✗

5 / 20 cells correct (25%)

Measurements¶

Metric	img.ask	Direct
Time (s)	41.26	34.00
Total tokens	0 (cross-session)	190,221
Cost (USD)	N/A (cross-session)	$0.078
Output tokens	N/A	567
Cache read tokens	N/A	185,832
Cache create tokens	N/A	3,526

img.ask runs in its own API session — token usage does not appear in the host session's delta.

Verdict¶

Aspect	Winner
Speed	Direct (34s vs 41s, ~17% faster)
Completeness	img.ask (5 rows vs 3 rows)
Accuracy	img.ask (25% of cells correct via Direct)
Token cost to host session	img.ask (cross-session, not charged)

img.ask significantly outperformed direct attachment. The direct subagent (Haiku 4.5) returned only 3 of 5 product rows and hallucinated or misread most prices. img.ask correctly identified all 5 rows using a dedicated vision model.

Direct was ~7s faster but at the cost of heavily degraded accuracy — making speed the only advantage, and a poor trade-off for structured extraction tasks.