Docs

OpenAI Responses API (v1/responses)

Full reference for the OpenAI-compatible POST /v1/responses endpoint on abliteration.ai. Request schema, structured input, streaming events, authentication, rate limits, and billing fields.

Updated 2026-04-14

abliteration.ai exposes POST /v1/responses, an OpenAI Responses API–compatible endpoint. Existing Responses API clients work with a base-URL and API-key switch.

Authenticate with a Bearer token (API key or JWT), send either a simple string input or a structured message array, and receive a Responses-style object with token usage and abliteration.ai credit metering fields.

Streaming is supported via Server-Sent Events (SSE). Set stream: true to receive evented deltas such as response.output_text.delta.

from openai import OpenAI

# Get yours at https://abliteration.ai/console/keys
client = OpenAI(
    base_url="https://api.abliteration.ai/v1",
    api_key="ak_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)

response = client.responses.create(
    model="abliterated-model",
    input="Explain abliteration in one paragraph.",
)

print(response.output[0].content[0].text)

Authentication

Include your credentials in the Authorization header as a Bearer token.

curl -X POST https://api.abliteration.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ak_YOUR_API_KEY" \
  -d '{
    "model": "abliterated-model",
    "input": "Hello, world!"
  }'

Request body

The endpoint accepts the standard Responses API envelope. Common fields are listed below, and additional supported fields are passed through to the backend model endpoint.

FieldTypeRequiredDescription
modelstringYesModel ID. Use "abliterated-model".
inputstring | arrayNoPlain text input or a structured Responses-format input array.
instructionsstringNoOptional system-style instruction.
streambooleanNoDefault false. Set to true for SSE streaming.
temperaturefloatNoSampling temperature (0–2).
max_output_tokensintegerNoUpper bound for generated output tokens.
toolsarrayNoOptional tool definitions forwarded to the backend Responses implementation.
tool_choicestring | objectNoControls whether the model may call tools.
flagged_categoriesarrayNoOptional moderation categories to block before inference.

Input formats

Use a plain string for simple text prompts, or send a structured array when you need multimodal inputs, prior turns, or tool-related state.

For images, use input_image parts. The Responses API does not accept video — for video inputs, use /v1/chat/completions with a video_url content block. See video docs.

{
  "model": "abliterated-model",
  "instructions": "Answer in two sentences.",
  "input": [
    {
      "role": "user",
      "content": [
        { "type": "input_text", "text": "What is shown in this image?" },
        { "type": "input_image", "image_url": "https://example.com/stonehenge.jpg" }
      ]
    }
  ],
  "max_output_tokens": 256
}

Non-streaming response

When stream is false (default), the full response is returned as JSON.

abliteration.ai adds the same billing fields used by the other public inference endpoints: remaining_credits, estimated_credits_used, and estimated_cost_usd.

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1735958400,
  "status": "completed",
  "model": "abliterated-model",
  "output": [
    {
      "id": "msg_abc123",
      "type": "message",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Abliteration removes refusal vectors from language models while preserving their broader capabilities.",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 31,
    "output_tokens": 18,
    "total_tokens": 49
  },
  "remaining_credits": 487,
  "estimated_credits_used": 1,
  "estimated_cost_usd": 0.000245
}

Streaming response

Set stream: true to receive Responses-style Server-Sent Events. Each event includes an event: line and a JSON data: payload.

Typical event types include response.created, response.output_text.delta, and response.completed.

curl -N https://api.abliteration.ai/v1/responses \
  -H "Authorization: Bearer $ABLIT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "abliterated-model",
    "input": "Write a five-word greeting.",
    "stream": true
  }'

event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress","model":"abliterated-model","output":[]}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hello from abliteration.ai"}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","model":"abliterated-model"}}

Rate limits

Limits are enforced per user in a rolling 60-second window. Exceeding the limit returns 429 with a Retry-After header.

Credit metering

Each call consumes credits based on total tokens (input + output). Credits are deducted after the response completes.

Developer tools

Machine-readable specs and ready-made collections for faster integration.