ReferenceUpdated 2026-05-01

LLM image moderation API

Per-image OpenAI omni-moderation runs server-side on every image attachment. Rejections return moderation_blocked with the offending category.

Image moderation runs server-side on every image you send through the abliteration.ai gateway.

Each image is forwarded to OpenAI's omni-moderation API one at a time — the upstream caps at one image per call, so we loop and aggregate categories before deciding to allow or reject.

Definition

LLM image moderation API

LLM image moderation is a server-side check that classifies each attached image across categories (sexual, violence, self-harm, hate, harassment, illicit) before the image reaches the model.

Why it matters
  • Block prohibited content from reaching the model regardless of which SDK the caller uses.
  • Surface category-level rejection reasons so client apps can show the right user-facing message.
  • Apply the same policy across /v1/chat/completions, /v1/messages, /v1/responses, and the /policy/* siblings.
  • Stay aligned with OpenAI's published moderation taxonomy without maintaining a parallel classifier.
How it works
  1. 01Send a request with text + image content blocks as usual.
  2. 02The gateway extracts each image, calls OpenAI omni-moderation once per image (text is moderated in its own call).
  3. 03If any category exceeds threshold, the gateway returns HTTP 400 with code moderation_blocked and the category in error.message.
  4. 04If everything passes, the request flows to vLLM/Modal for inference.
  5. 05Override per-route or per-policy via the /policy/* sibling endpoints when you need stricter or looser thresholds.
Rejection response shape
HTTP/1.1 400 Bad Request
Content-Type: application/json

{
  "error": {
    "message": "Moderation blocked: image flagged as 'violence'.",
    "type": "invalid_request_error",
    "code": "moderation_blocked"
  }
}
FAQ

Frequently asked questions.

What categories are checked?

OpenAI's omni-moderation taxonomy: sexual, sexual/minors, violence, violence/graphic, self-harm, self-harm/intent, self-harm/instructions, hate, hate/threatening, harassment, harassment/threatening, and illicit. Categories may be added when OpenAI updates the API.

How are multiple images handled?

Each image is moderated individually in its own omni-moderation call (the upstream API caps at one image per request). The text prompt is moderated in a separate call. The gateway aggregates results — any flagged item rejects the whole request.

Does base64 vs HTTPS URL matter for moderation?

No. Both shapes are decoded server-side and sent through the same per-image moderation call. Base64 data URLs skip the SSRF fetch but still get moderated.

Can I disable moderation?

Not on the public /v1/* endpoints. The /policy/* siblings let you configure custom thresholds and category lists per workspace policy — see /docs/policy-gateway-integration.

What about CSAM detection?

Tracked as TODO(CSAM-B2B-GA) in the moderation pipeline — required before public B2B launch. Plan: hash-matching against PhotoDNA/NCMEC databases at the gateway before omni-moderation. Same first-frame extraction infra unlocks video CSAM.

How is video moderated?

Video bytes currently flow to vLLM unmoderated — only the prompt text is moderated for video requests. Per-frame video moderation via ffmpeg first-frame extraction is planned (TODO(VIDEO-MODERATION)) before public launch. See /docs/video.

What HTTP status and error code do rejections return?

HTTP 400 with error.code = 'moderation_blocked'. The error.message names the offending category. The audit log emits rejection_reason='moderation_blocked' on both audit.request_denied and llm.request.rejected events.

Are moderation calls billed?

OpenAI omni-moderation is free at the API level. abliteration.ai does not pass through a moderation surcharge — only successful inference requests count against your token quota.