Skip to content

OT Image

Load images and ask vision questions via an OpenAI-compatible API.

Short alias: img

Highlights

  • Load images from file paths, URLs, or the clipboard once; reference by handle for follow-up questions
  • Ask multiple questions in a single model call — answers returned as paired question/answer list
  • Structured summaries (text, mode, type, colours) extracted and cached in meta.json
  • Clipboard shortcuts clip_ask() and clip_view() — no handle juggling needed

Functions

Function Description
ot_image.load(img, ...) Load a single image; return a stable handle
ot_image.load_batch(img, ...) Load multiple images from a glob or list
ot_image.ask(img, q, ...) Ask one or more questions about an image
ot_image.clip_ask(q, ...) Shorthand: ask about the current clipboard image
ot_image.clip_view() Shorthand: structured summary of the current clipboard image
ot_image.summary(img) Extract and cache a structured summary of an image
ot_image.list() List all loaded images with metadata
ot_image.delete(handle) Delete an image and free session cache
ot_image.purge(...) Delete images by age or delete all

Key Parameters

Parameter Type Description
img str Image source: file path, "https://..." URL, "clip" for clipboard, or "#handle" to reference an existing handle
handle str Custom handle name for load() (e.g. "logo"). Omit for auto-generated img_<8hex>
q str | list[str] Question(s) to ask. Multiple questions are batched into one model call
max_edge int Max longest edge in pixels for in-memory model resize. Default: 1568
all bool purge(all=True) deletes all images regardless of age
minutes int purge(minutes=N) deletes images older than N minutes. Default: 15

Requires

  • A vision model: set tools.ot_image.model or the top-level llm.model for ask(), summary(), clip_ask(), and clip_view().
  • OPENAI_API_KEY in secrets.yaml for vision model calls.

Configuration

Optional

Key Type Default Description
tools.ot_image.model str "" Vision model for ask() and summary(). Falls back to llm.model
tools.ot_image.base_url str "" OpenAI-compatible base URL. Falls back to llm.base_url
tools.ot_image.max_edge int 1568 Maximum longest edge (pixels) for model-upload resize
tools.ot_image.session_cache_size int 10 In-memory LRU cache cap (number of images)
# Minimal — inherits model and base_url from top-level llm: block
llm:
  base_url: https://openrouter.ai/api/v1
  model: google/gemini-2-flash-preview

# Override just for ot_image (optional)
tools:
  ot_image:
    model: openai/gpt-4o-mini   # overrides llm.model for vision calls only
    max_edge: 1568
    session_cache_size: 10

Defaults

  • If tools.ot_image is omitted, load() and list() work without config. ask() and summary() require a model via tools.ot_image.model or llm.model.
  • model and base_url fall back to the top-level llm: config block. API key is always read from the OPENAI_API_KEY secret.

Examples

# Load from file and ask a question
result = ot_image.load(img="~/screenshots/dashboard.png")
ot_image.ask(img=result["handle"], q="What is the main metric shown?")

# Ask multiple questions in one call
ot_image.ask(
    img="~/screenshots/dashboard.png",
    q=["What framework is shown?", "Is this dark mode?"]
)

# Load with a custom handle name
ot_image.load(img="~/assets/logo.png", handle="logo")
ot_image.ask(img="#logo", q="What colour is the logo?")

# Clipboard shortcuts (no load step needed)
ot_image.clip_ask(q="Extract all text from this screenshot")
ot_image.clip_view()

# Structured summary — cached after first call
ot_image.summary(img="#logo")

# Load a batch from glob
ot_image.load_batch(img="~/screenshots/*.png")

# Load a batch from a list
ot_image.load_batch(img=["~/a.png", "~/b.png"])

# List all loaded images
ot_image.list()

# Delete a single image
ot_image.delete(handle="#img_a3f7b2c4")

# Purge images older than 1 hour
ot_image.purge(minutes=60)

# Purge all images
ot_image.purge(all=True)

See Also