OpenAI Responses API (v1/responses)
Full reference for the OpenAI-compatible POST /v1/responses endpoint on abliteration.ai. Request schema, structured input, streaming events, authentication, rate limits, and billing fields.
abliteration.ai exposes POST /v1/responses, an OpenAI Responses API–compatible endpoint. Existing Responses API clients work with a base-URL and API-key switch.
Authenticate with a Bearer token (API key or JWT), send either a simple string input or a structured message array, and receive a Responses-style object with token usage and abliteration.ai credit metering fields.
Streaming is supported via Server-Sent Events (SSE). Set stream: true to receive evented deltas such as response.output_text.delta.
Quick start
from openai import OpenAI
client = OpenAI(
base_url="https://api.abliteration.ai/v1",
api_key="ak_YOUR_API_KEY",
)
response = client.responses.create(
model="abliterated-model",
input="Explain abliteration in one paragraph.",
)
print(response.output[0].content[0].text)Service notes
- Pricing model: Usage-based pricing (~$5 per 1M tokens) billed on total tokens (input + output). See the API pricing page for current plans.
- Data retention: No prompt/output retention by default. Operational telemetry (token counts, timestamps, error codes) is retained for billing and reliability.
- Compatibility: OpenAI-style /v1/chat/completions request and response format with a base URL switch.
- Latency: Depends on model size, prompt length, and load. Streaming reduces time-to-first-token.
- Throughput: Team plans include priority throughput. Actual throughput varies with demand.
- Rate limits: Limits vary by plan and load. Handle 429s with backoff and respect any Retry-After header.
Authentication
#Include your credentials in the Authorization header as a Bearer token.
- API key — keys start with
ak_. Send asAuthorization: Bearer ak_.... - JWT — obtained from
POST /api/auth/login. Send asAuthorization: Bearer <jwt>. - Anonymous free tier — omit the token and set the
X-Free-Tier: trueheader. Limited to 1 free request per device.
curl -X POST https://api.abliteration.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ak_YOUR_API_KEY" \
-d '{
"model": "abliterated-model",
"input": "Hello, world!"
}'Request body
#The endpoint accepts the standard Responses API envelope. Common fields are listed below, and additional supported fields are passed through to the backend model endpoint.
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID. Use "abliterated-model". |
input | string | array | No | Plain text input or a structured Responses-format input array. |
instructions | string | No | Optional system-style instruction. |
stream | boolean | No | Default false. Set to true for SSE streaming. |
temperature | float | No | Sampling temperature (0–2). |
max_output_tokens | integer | No | Upper bound for generated output tokens. |
tools | array | No | Optional tool definitions forwarded to the backend Responses implementation. |
tool_choice | string | object | No | Controls whether the model may call tools. |
flagged_categories | array | No | Optional moderation categories to block before inference. |
Input formats
#Use a plain string for simple text prompts, or send a structured array when you need multimodal inputs, prior turns, or tool-related state.
For images, use input_image parts. Video inputs are not supported.
{
"model": "abliterated-model",
"instructions": "Answer in two sentences.",
"input": [
{
"role": "user",
"content": [
{ "type": "input_text", "text": "What is shown in this image?" },
{ "type": "input_image", "image_url": "https://example.com/stonehenge.jpg" }
]
}
],
"max_output_tokens": 256
}Non-streaming response
#When stream is false (default), the full response is returned as JSON.
abliteration.ai adds the same billing fields used by the other public inference endpoints: remaining_credits, estimated_credits_used, and estimated_cost_usd.
{
"id": "resp_abc123",
"object": "response",
"created_at": 1735958400,
"status": "completed",
"model": "abliterated-model",
"output": [
{
"id": "msg_abc123",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Abliteration removes refusal vectors from language models while preserving their broader capabilities.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 31,
"output_tokens": 18,
"total_tokens": 49
},
"remaining_credits": 487,
"estimated_credits_used": 1,
"estimated_cost_usd": 0.000245
}Streaming response
#Set stream: true to receive Responses-style Server-Sent Events. Each event includes an event: line and a JSON data: payload.
Typical event types include response.created, response.output_text.delta, and response.completed.
curl -N https://api.abliteration.ai/v1/responses \
-H "Authorization: Bearer $ABLIT_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "abliterated-model",
"input": "Write a five-word greeting.",
"stream": true
}'
event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress","model":"abliterated-model","output":[]}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hello from abliteration.ai"}
event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","model":"abliterated-model"}}Rate limits
#Limits are enforced per user in a rolling 60-second window. Exceeding the limit returns 429 with a Retry-After header.
- API key callers: 120 requests per 60-second window.
- UI / JWT callers: 30 requests per 60-second window.
- Rate-limiter failures are fail-open — requests are allowed but the incident is logged.
Credit metering
#Each call consumes credits based on total tokens (input + output). Credits are deducted after the response completes.
- Minimum charge is 1 credit per call.
- Credits per call:
ceil(total_tokens / 500). - Pricing: ~$5 per 1 M tokens. See the pricing page for current plans.
- Anonymous free-tier calls do not consume credits.
- If credits are insufficient, the endpoint returns
402.
Developer tools
#Machine-readable specs and ready-made collections for faster integration.
- OpenAPI 3.0 spec — import into Swagger UI, Redocly, or any OpenAPI-compatible tool.
- Well-known OpenAPI discovery —
/.well-known/openapi.jsonfor automated tooling. - Postman collection & OpenAPI guide — pre-built Postman collection with environment variables.
- OpenAI compatibility guide — migrate existing OpenAI clients with a base URL swap.
Common errors & fixes
- 401 Unauthorized: Check that your API key is set and sent as a Bearer token.
- 404 Not Found: Make sure the base URL ends with /v1 and you call /chat/completions.
- 400 Bad Request: Verify the model id and that messages are an array of { role, content } objects.
- 429 Rate limit: Back off and retry. Use the Retry-After header for pacing.