LLM image moderation API
Per-image OpenAI omni-moderation runs server-side on every image attachment. Rejections return moderation_blocked with the offending category.
Image moderation runs server-side on every image you send through the abliteration.ai gateway.
Each image is forwarded to OpenAI's omni-moderation API one at a time — the upstream caps at one image per call, so we loop and aggregate categories before deciding to allow or reject.
LLM image moderation API
LLM image moderation is a server-side check that classifies each attached image across categories (sexual, violence, self-harm, hate, harassment, illicit) before the image reaches the model.
- Block prohibited content from reaching the model regardless of which SDK the caller uses.
- Surface category-level rejection reasons so client apps can show the right user-facing message.
- Apply the same policy across /v1/chat/completions, /v1/messages, /v1/responses, and the /policy/* siblings.
- Stay aligned with OpenAI's published moderation taxonomy without maintaining a parallel classifier.
- 01Send a request with text + image content blocks as usual.
- 02The gateway extracts each image, calls OpenAI omni-moderation once per image (text is moderated in its own call).
- 03If any category exceeds threshold, the gateway returns HTTP 400 with code moderation_blocked and the category in error.message.
- 04If everything passes, the request flows to vLLM/Modal for inference.
- 05Override per-route or per-policy via the /policy/* sibling endpoints when you need stricter or looser thresholds.
HTTP/1.1 400 Bad Request
Content-Type: application/json
{
"error": {
"message": "Moderation blocked: image flagged as 'violence'.",
"type": "invalid_request_error",
"code": "moderation_blocked"
}
}Frequently asked questions.
What categories are checked?
OpenAI's omni-moderation taxonomy: sexual, sexual/minors, violence, violence/graphic, self-harm, self-harm/intent, self-harm/instructions, hate, hate/threatening, harassment, harassment/threatening, and illicit. Categories may be added when OpenAI updates the API.
How are multiple images handled?
Each image is moderated individually in its own omni-moderation call (the upstream API caps at one image per request). The text prompt is moderated in a separate call. The gateway aggregates results — any flagged item rejects the whole request.
Does base64 vs HTTPS URL matter for moderation?
No. Both shapes are decoded server-side and sent through the same per-image moderation call. Base64 data URLs skip the SSRF fetch but still get moderated.
Can I disable moderation?
Not on the public /v1/* endpoints. The /policy/* siblings let you configure custom thresholds and category lists per workspace policy — see /docs/policy-gateway-integration.
What about CSAM detection?
Tracked as TODO(CSAM-B2B-GA) in the moderation pipeline — required before public B2B launch. Plan: hash-matching against PhotoDNA/NCMEC databases at the gateway before omni-moderation. Same first-frame extraction infra unlocks video CSAM.
How is video moderated?
Video bytes currently flow to vLLM unmoderated — only the prompt text is moderated for video requests. Per-frame video moderation via ffmpeg first-frame extraction is planned (TODO(VIDEO-MODERATION)) before public launch. See /docs/video.
What HTTP status and error code do rejections return?
HTTP 400 with error.code = 'moderation_blocked'. The error.message names the offending category. The audit log emits rejection_reason='moderation_blocked' on both audit.request_denied and llm.request.rejected events.
Are moderation calls billed?
OpenAI omni-moderation is free at the API level. abliteration.ai does not pass through a moderation surcharge — only successful inference requests count against your token quota.