OpenAI Responses API (v1/responses)
Full reference for the OpenAI-compatible POST /v1/responses endpoint on abliteration.ai. Request schema, structured input, streaming events, authentication, rate limits, and billing fields.
abliteration.ai exposes POST /v1/responses, an OpenAI Responses API–compatible endpoint. Existing Responses API clients work with a base-URL and API-key switch.
Authenticate with a Bearer token (API key or JWT), send either a simple string input or a structured message array, and receive a Responses-style object with token usage and abliteration.ai credit metering fields.
Streaming is supported via Server-Sent Events (SSE). Set stream: true to receive evented deltas such as response.output_text.delta.
from openai import OpenAI
# Get yours at https://abliteration.ai/console/keys
client = OpenAI(
base_url="https://api.abliteration.ai/v1",
api_key="ak_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)
response = client.responses.create(
model="abliterated-model",
input="Explain abliteration in one paragraph.",
)
print(response.output[0].content[0].text)Authentication
Include your credentials in the Authorization header as a Bearer token.
curl -X POST https://api.abliteration.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ak_YOUR_API_KEY" \
-d '{
"model": "abliterated-model",
"input": "Hello, world!"
}'Request body
The endpoint accepts the standard Responses API envelope. Common fields are listed below, and additional supported fields are passed through to the backend model endpoint.
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID. Use "abliterated-model". |
input | string | array | No | Plain text input or a structured Responses-format input array. |
instructions | string | No | Optional system-style instruction. |
stream | boolean | No | Default false. Set to true for SSE streaming. |
temperature | float | No | Sampling temperature (0–2). |
max_output_tokens | integer | No | Upper bound for generated output tokens. |
tools | array | No | Optional tool definitions forwarded to the backend Responses implementation. |
tool_choice | string | object | No | Controls whether the model may call tools. |
flagged_categories | array | No | Optional moderation categories to block before inference. |
Input formats
Use a plain string for simple text prompts, or send a structured array when you need multimodal inputs, prior turns, or tool-related state.
For images, use input_image parts. The Responses API does not accept video — for video inputs, use /v1/chat/completions with a video_url content block. See video docs.
{
"model": "abliterated-model",
"instructions": "Answer in two sentences.",
"input": [
{
"role": "user",
"content": [
{ "type": "input_text", "text": "What is shown in this image?" },
{ "type": "input_image", "image_url": "https://example.com/stonehenge.jpg" }
]
}
],
"max_output_tokens": 256
}Non-streaming response
When stream is false (default), the full response is returned as JSON.
abliteration.ai adds the same billing fields used by the other public inference endpoints: remaining_credits, estimated_credits_used, and estimated_cost_usd.
{
"id": "resp_abc123",
"object": "response",
"created_at": 1735958400,
"status": "completed",
"model": "abliterated-model",
"output": [
{
"id": "msg_abc123",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Abliteration removes refusal vectors from language models while preserving their broader capabilities.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 31,
"output_tokens": 18,
"total_tokens": 49
},
"remaining_credits": 487,
"estimated_credits_used": 1,
"estimated_cost_usd": 0.000245
}Streaming response
Set stream: true to receive Responses-style Server-Sent Events. Each event includes an event: line and a JSON data: payload.
Typical event types include response.created, response.output_text.delta, and response.completed.
curl -N https://api.abliteration.ai/v1/responses \
-H "Authorization: Bearer $ABLIT_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "abliterated-model",
"input": "Write a five-word greeting.",
"stream": true
}'
event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress","model":"abliterated-model","output":[]}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hello from abliteration.ai"}
event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","model":"abliterated-model"}}Rate limits
Limits are enforced per user in a rolling 60-second window. Exceeding the limit returns 429 with a Retry-After header.
Credit metering
Each call consumes credits based on total tokens (input + output). Credits are deducted after the response completes.
Developer tools
Machine-readable specs and ready-made collections for faster integration.