ReferenceUpdated 2026-05-01

LLM image moderation API

Per-image OpenAI omni-moderation runs server-side on every image attachment. Rejections return moderation_blocked with the offending category.

Image moderation runs server-side on every image you send through the abliteration.ai gateway.

Each image is forwarded to OpenAI's omni-moderation API one at a time — the upstream caps at one image per call, so we loop and aggregate categories before deciding to allow or reject.

Definition

LLM image moderation API

LLM image moderation is a server-side check that classifies each attached image across categories (sexual, violence, self-harm, hate, harassment, illicit) before the image reaches the model.

Why it matters

Block prohibited content from reaching the model regardless of which SDK the caller uses.
Surface category-level rejection reasons so client apps can show the right user-facing message.
Apply the same policy across /v1/chat/completions, /v1/messages, /v1/responses, and the /policy/* siblings.
Stay aligned with OpenAI's published moderation taxonomy without maintaining a parallel classifier.

How it works

01Send a request with text + image content blocks as usual.
02The gateway extracts each image, calls OpenAI omni-moderation once per image (text is moderated in its own call).
03If any category exceeds threshold, the gateway returns HTTP 400 with code moderation_blocked and the category in error.message.
04If everything passes, the request flows to vLLM/Modal for inference.
05Override per-route or per-policy via the /policy/* sibling endpoints when you need stricter or looser thresholds.

Rejection response shape

HTTP/1.1 400 Bad Request
Content-Type: application/json

{
  "error": {
    "message": "Moderation blocked: image flagged as 'violence'.",
    "type": "invalid_request_error",
    "code": "moderation_blocked"
  }
}

FAQ

Frequently asked questions.

What categories are checked?

OpenAI's omni-moderation taxonomy: sexual, sexual/minors, violence, violence/graphic, self-harm, self-harm/intent, self-harm/instructions, hate, hate/threatening, harassment, harassment/threatening, and illicit. Categories may be added when OpenAI updates the API.

How are multiple images handled?

Each image is moderated individually in its own omni-moderation call (the upstream API caps at one image per request). The text prompt is moderated in a separate call. The gateway aggregates results — any flagged item rejects the whole request.

Does base64 vs HTTPS URL matter for moderation?

No. Both shapes are decoded server-side and sent through the same per-image moderation call. Base64 data URLs skip the SSRF fetch but still get moderated.

Can I disable moderation?

Not on the public /v1/* endpoints. The /policy/* siblings let you configure custom thresholds and category lists per workspace policy — see /docs/policy-gateway-integration.

What about CSAM detection?

Tracked as TODO(CSAM-B2B-GA) in the moderation pipeline — required before public B2B launch. Plan: hash-matching against PhotoDNA/NCMEC databases at the gateway before omni-moderation. Same first-frame extraction infra unlocks video CSAM.

How is video moderated?

Video bytes currently flow to vLLM unmoderated — only the prompt text is moderated for video requests. Per-frame video moderation via ffmpeg first-frame extraction is planned (TODO(VIDEO-MODERATION)) before public launch. See /docs/video.

What HTTP status and error code do rejections return?

HTTP 400 with error.code = 'moderation_blocked'. The error.message names the offending category. The audit log emits rejection_reason='moderation_blocked' on both audit.request_denied and llm.request.rejected events.

Are moderation calls billed?

OpenAI omni-moderation is free at the API level. abliteration.ai does not pass through a moderation surcharge — only successful inference requests count against your token quota.

Next steps.

Vision and multimodal inputs guide Image LLM API Custom moderation rules Guardrails, audit logs, rollouts Policy gateway integration See API Pricing View Unrestricted Models Rate limits Privacy policy