ReferenceUpdated 2026-05-01

Video-capable LLM API

Send short videos to OpenAI-compatible chat completions. MP4/WebM/MOV up to 25 MB / 30 seconds.

A video-capable LLM API accepts video bytes in the same request as text and returns natural-language descriptions or structured answers.

On abliteration.ai, video is supported on /v1/chat/completions only — Anthropic Messages and OpenAI Responses do not accept video natively.

Definition

Video-capable LLM API

A video-capable LLM API lets you include short videos as inputs and receive descriptions, structured answers, or grounded reasoning from the model.

Why it matters
  • Describe scenes, actions, or interactions captured on video.
  • Summarize short screen recordings or product demos.
  • Extract structured data — counts, labels, timestamps — from short clips.
  • Combine video with text instructions for grounded reasoning.
How it works
  1. 01Call /v1/chat/completions with a model that supports video inputs.
  2. 02Send message.content as an array mixing text parts and video_url parts.
  3. 03Use a public HTTPS URL or an inline data:video/mp4;base64,... URL.
  4. 04Authenticate with a JWT or API key — anon free-tier callers are blocked from video.
  5. 05Stream responses by setting stream: true and consuming delta chunks.
Example request
curl https://api.abliteration.ai/v1/chat/completions \
  -H "Authorization: Bearer $ABLIT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "abliterated-model",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Describe this clip." },
          { "type": "video_url", "video_url": { "url": "https://example.com/clip.mp4" } }
        ]
      }
    ]
  }'
FAQ

Frequently asked questions.

Which endpoints accept video?

Only /v1/chat/completions and /policy/chat/completions. /v1/messages and /v1/responses return 400 with a neutral pointer to https://docs.abliteration.ai.

What video formats are supported?

MP4 (video/mp4), WebM (video/webm), and QuickTime/MOV (video/quicktime). Convert other containers before sending.

How long can a video be?

Up to 30 seconds and up to 25 MB raw. Longer or larger videos return HTTP 413 with the body-cap message.

Can I send video as a public URL?

Yes. The backend fetches the URL server-side. The same SSRF guard that blocks private IPs for image URLs applies to video URLs — rejection code is unsafe_video_url.

Can anonymous (free-tier) users send video?

No. Anonymous X-Free-Tier callers are blocked from video specifically. Text and image stay free. The rejection code is video_anon_blocked.

How many tokens does a video use?

Roughly proportional to the number of frames sampled times their resolution. A 2-second 128x256 clip is about 90 tokens; a 5-second 480p clip is around 2,000 tokens; a 10-second 360p clip is around 3,700 tokens. Downscale to keep latency and cost predictable.

Is video moderated?

The accompanying prompt text is moderated. Per-frame video moderation is planned before public launch — tracked as TODO(VIDEO-MODERATION) in the moderation pipeline. Avoid uploading content that violates the abliteration.ai usage policy.

Does streaming work with video?

Yes. Set stream: true and consume delta chunks the same way as text-only completions. Time to first token is higher for video because vLLM samples frames first.

Why isn't video supported on /v1/messages or /v1/responses?

The Anthropic Messages spec has 17 content types and none are video. The OpenAI Responses spec accepts input_text/input_image/input_file but not video. We match the canonical specs rather than adding a non-standard translation layer. When upstream specs add video, we will too.