Is the abliteration.ai /v1/responses endpoint compatible with the OpenAI SDK?

Yes. Point the OpenAI SDK at https://api.abliteration.ai/v1 and call client.responses.create(...) with your abliteration.ai API key.

How do I authenticate with the /v1/responses endpoint?

Send your API key (starts with ak_) or JWT as a Bearer token in the Authorization header. Anonymous users can send one free request by including any non-empty X-Free-Tier header.

Does the /v1/responses endpoint support streaming?

Yes. Set stream: true to receive Server-Sent Events (SSE). The stream includes Responses-style events such as response.created, response.output_text.delta, and response.completed.

Does the /v1/responses endpoint support multimodal input?

Yes. You can send structured input with input_text and input_image content parts. Image URLs are supported. Video inputs are rejected.

Can I send tools and tool_choice to /v1/responses?

Yes. Standard Responses API fields such as tools and tool_choice are forwarded to the backend model endpoint unchanged unless abliteration.ai needs to enforce auth, moderation, billing, or model aliasing.

How is credit usage calculated for each API call?

Credits are calculated as ceil(total_tokens / 500), with a minimum of 1 credit per call. Total tokens includes both input and output tokens. Pricing is approximately $3 per 1 million tokens.

Docs

OpenAI Responses API (v1/responses)

Full reference for the OpenAI-compatible POST /v1/responses endpoint on abliteration.ai. Request schema, structured input, streaming events, authentication, rate limits, and billing fields.

Updated 2026-04-14

abliteration.ai exposes POST /v1/responses, an OpenAI Responses API–compatible endpoint. Existing Responses API clients work with a base-URL and API-key switch.

Authenticate with a Bearer token (API key or JWT), send either a simple string input or a structured message array, and receive a Responses-style object with token usage and abliteration.ai credit metering fields.

Streaming is supported via Server-Sent Events (SSE). Set stream: true to receive evented deltas such as response.output_text.delta.

from openai import OpenAI

# Get yours at https://abliteration.ai/console/keys
client = OpenAI(
    base_url="https://api.abliteration.ai/v1",
    api_key="ak_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)

response = client.responses.create(
    model="abliterated-model",
    input="Explain abliteration in one paragraph.",
)

print(response.output[0].content[0].text)

Authentication

Include your credentials in the Authorization header as a Bearer token.

curl -X POST https://api.abliteration.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ak_YOUR_API_KEY" \
  -d '{
    "model": "abliterated-model",
    "input": "Hello, world!"
  }'

Request body

The endpoint accepts the standard Responses API envelope. Common fields are listed below, and additional supported fields are passed through to the backend model endpoint.

Field	Type	Required	Description
`model`	string	Yes	Model ID. Use `"abliterated-model"`.
`input`	string \| array	No	Plain text input or a structured Responses-format input array.
`instructions`	string	No	Optional system-style instruction.
`stream`	boolean	No	Default `false`. Set to `true` for SSE streaming.
`temperature`	float	No	Sampling temperature (0–2).
`max_output_tokens`	integer	No	Upper bound for generated output tokens.
`tools`	array	No	Optional tool definitions forwarded to the backend Responses implementation.
`tool_choice`	string \| object	No	Controls whether the model may call tools.
`flagged_categories`	array	No	Optional moderation categories to block before inference.

Input formats

Use a plain string for simple text prompts, or send a structured array when you need multimodal inputs, prior turns, or tool-related state.

For images, use input_image parts. The Responses API does not accept video — for video inputs, use /v1/chat/completions with a video_url content block. See video docs.

{
  "model": "abliterated-model",
  "instructions": "Answer in two sentences.",
  "input": [
    {
      "role": "user",
      "content": [
        { "type": "input_text", "text": "What is shown in this image?" },
        { "type": "input_image", "image_url": "https://example.com/stonehenge.jpg" }
      ]
    }
  ],
  "max_output_tokens": 256
}

Non-streaming response

When stream is false (default), the full response is returned as JSON.

abliteration.ai adds the same billing fields used by the other public inference endpoints: remaining_credits, estimated_credits_used, and estimated_cost_usd.

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1735958400,
  "status": "completed",
  "model": "abliterated-model",
  "output": [
    {
      "id": "msg_abc123",
      "type": "message",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Abliteration removes refusal vectors from language models while preserving their broader capabilities.",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 31,
    "output_tokens": 18,
    "total_tokens": 49
  },
  "remaining_credits": 487,
  "estimated_credits_used": 1,
  "estimated_cost_usd": 0.000245
}

Streaming response

Set stream: true to receive Responses-style Server-Sent Events. Each event includes an event: line and a JSON data: payload.

Typical event types include response.created, response.output_text.delta, and response.completed.

curl -N https://api.abliteration.ai/v1/responses \
  -H "Authorization: Bearer $ABLIT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "abliterated-model",
    "input": "Write a five-word greeting.",
    "stream": true
  }'

event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress","model":"abliterated-model","output":[]}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hello from abliteration.ai"}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","model":"abliterated-model"}}

Rate limits

Limits are enforced per user in a rolling 60-second window. Exceeding the limit returns 429 with a Retry-After header.

Credit metering

Each call consumes credits based on total tokens (input + output). Credits are deducted after the response completes.

Developer tools

Machine-readable specs and ready-made collections for faster integration.