Skip to content

Vision Comparison: img.ask vs Direct Attachment

Date: 2026-03-05 Image: tests/data/products-med.png Task: Extract prices from a 4-column product grid, left-to-right, top-to-bottom Method: Subagent-isolated timing (tmr) + session usage delta (cld)


Price Grids

Approach A — img.ask

Col 1 Col 2 Col 3 Col 4
$1,397.00 $1,997.00 $2,197.00 $2,049.00
$797.00 $341.00 $797.00 $517.00
$368.00 $1,397.00 $377.00 $453.00
$229.00 $299.00 $731.00 $470.00
$749.00 $1,018.00 $575.00 $541.00

Approach B — Direct Attachment (Read tool → subagent)

Col 1 Col 2 Col 3 Col 4
$727 $341 $197 $249
$368 $1,597 $377 $453
$229 $299 $231 $470

Direct returned only 3 rows vs. 5 rows from img.ask.


Cell-by-cell Accuracy

Using img.ask as the reference. Rows 1–2 from Direct don't correspond to any img.ask row. Rows 3–5 partially overlap with Direct rows 1–3.

Row Col 1 Col 2 Col 3 Col 4
1
2
3
4
5

5 / 20 cells correct (25%)


Measurements

Metric img.ask Direct
Time (s) 41.26 34.00
Total tokens 0 (cross-session) 190,221
Cost (USD) N/A (cross-session) $0.078
Output tokens N/A 567
Cache read tokens N/A 185,832
Cache create tokens N/A 3,526

img.ask runs in its own API session — token usage does not appear in the host session's delta.


Verdict

Aspect Winner
Speed Direct (34s vs 41s, ~17% faster)
Completeness img.ask (5 rows vs 3 rows)
Accuracy img.ask (25% of cells correct via Direct)
Token cost to host session img.ask (cross-session, not charged)

img.ask significantly outperformed direct attachment. The direct subagent (Haiku 4.5) returned only 3 of 5 product rows and hallucinated or misread most prices. img.ask correctly identified all 5 rows using a dedicated vision model.

Direct was ~7s faster but at the cost of heavily degraded accuracy — making speed the only advantage, and a poor trade-off for structured extraction tasks.