Use caseUpdated 2026-05-01

Screenshot analysis API

Send screenshots to a multimodal LLM and get descriptions, error explanations, or structured UI extraction back.

Screenshot analysis turns pixel-level UI captures into text the rest of your app can act on.

Common patterns: customer support triage, automated bug reports, accessibility narration, end-to-end test failure summaries, and SaaS onboarding assistants.

Definition

Screenshot analysis API

A screenshot analysis API accepts a screenshot image plus a text prompt and returns a description, structured extraction, or error explanation grounded in what's visible on screen.

Why it matters

Customers send screenshots faster than they describe problems — extract the actual issue from the picture.
End-to-end test runs produce thousands of failure screenshots — summarize them at scale instead of reviewing one by one.
Accessibility tools can narrate dynamic UI states without alt-text instrumentation.
Onboarding flows can answer 'what does this screen mean?' without writing per-screen documentation.

How it works

01Capture the screenshot client-side (HTML5 canvas, OS APIs, headless browser) and base64-encode or upload to a public URL.
02POST to /v1/chat/completions with content blocks: a text prompt describing the task, then an image_url block with the screenshot.
03For repeated tasks, write a focused prompt — 'List the visible error message and the button the user most likely should click next.' beats 'Describe this image.'
04Stream responses (stream: true) when the user is waiting for the answer.
05For structured output, ask for JSON in the prompt and validate the response client-side.

Triage a support screenshot

curl https://api.abliteration.ai/v1/chat/completions \
  -H "Authorization: Bearer $ABLIT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "abliterated-model",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Look at this support screenshot. Reply with JSON: {error_message, screen_name, suggested_next_action}."
          },
          { "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBORw0KGgo..." } }
        ]
      }
    ]
  }'

FAQ

Frequently asked questions.

What's the right resolution for screenshots?

Match the source — don't upscale. The model uses Qwen2.5-VL's smart_resize tokenizer, so dimensions drive token cost (≈ tokens = (H × W) / 784). Downscale to 768px on the longest side for general descriptions, 1280px for fine UI text.

Can it read on-screen text reliably?

Yes for clear, large UI text. For dense text (logs, code, terminal output) it helps to crop tightly around the relevant region and include a prompt like 'Quote the exact error text verbatim.' For long documents, use the document-image-extraction pattern instead.

How do I avoid hallucinated descriptions?

Be explicit about uncertainty: prompt with 'If you cannot tell from the image, say so — do not guess.' Ask for direct quotes when text matters. For numeric extraction, ask the model to label its confidence.

Is it OK to send screenshots that contain user data?

Yes — abliteration.ai is zero-data-retention by default. Prompts and images are not stored beyond the request lifecycle. For per-tenant guarantees see /zero-data-retention-ai-api.

How fast is the response?

First token typically arrives in 1–3 seconds for a 1280×720 screenshot. Use stream: true to start showing output as it generates. Latency scales with image dimensions, not file size.

Can I send multiple screenshots at once?

Yes — add multiple image_url blocks. 'Compare these two screens and tell me what changed' is a common pattern. Keep it to ≤4 to maintain response quality and predictable latency.

Are screenshots moderated?

Yes — same OpenAI omni-moderation as any image attachment. UI screenshots almost never trigger rejection unless they contain user-generated content that crosses moderation thresholds.

Next steps.

Vision and multimodal inputs guide Image LLM API Document image extraction API Multimodal LLM API overview Streaming chat completions See API Pricing View Unrestricted Models Rate limits Privacy policy