Is the abliteration.ai /v1/messages endpoint compatible with the Anthropic SDK?

Yes. The endpoint follows the Anthropic Messages API format. Set base_url to https://api.abliteration.ai in the Anthropic Python or TypeScript SDK and supply your abliteration.ai API key.

How do I authenticate with the /v1/messages endpoint?

For raw HTTP, send your API key (starts with ak_) or JWT as a Bearer token in the Authorization header. Anthropic SDKs may use api_key, which sends the same ak_ key as x-api-key. Anonymous users can send one free request by including the X-Free-Tier: 1 header.

Does the /v1/messages endpoint support streaming?

Yes. Set stream: true in the request body to receive Server-Sent Events (SSE). Events follow the Anthropic streaming protocol with message_start, content_block_delta, and message_stop events.

What are the rate limits for the /v1/messages endpoint?

API key callers are limited to 120 requests per 60-second window. UI and JWT callers are limited to 30 requests per 60-second window. Exceeding the limit returns a 429 status with a Retry-After header.

How is credit usage calculated for each API call?

Credits are calculated as ceil(total_tokens / 500), with a minimum of 1 credit per call. Total tokens includes both input and output tokens. Pricing is approximately $3 per 1 million tokens.

Can I count tokens before sending a request?

Yes. Use the POST /v1/messages/count_tokens endpoint with the same request schema. It returns an estimated input token count without generating a response, useful for budget checks.

Docs

Anthropic Messages API (v1/messages)

Full reference for the Anthropic-compatible POST /v1/messages endpoint on abliteration.ai. Request schema, streaming, authentication, rate limits, and error codes.

Updated 2026-05-01

abliteration.ai exposes POST /v1/messages, an Anthropic Messages API–compatible endpoint. Tools and SDKs that target the Anthropic API work with a base-URL switch.

Authenticate with a Bearer token (API key or JWT), or use the Anthropic SDK api_key field with your ak_ key. Send an Anthropic-format message array and receive a structured response with token usage and credit metering.

Streaming is supported via Server-Sent Events (SSE). Set stream: true to receive delta chunks as they are generated.

import anthropic

# Get yours at https://abliteration.ai/console/keys
# The Anthropic SDK sends api_key as x-api-key; abliteration.ai accepts ak_ keys there.
client = anthropic.Anthropic(
    base_url="https://api.abliteration.ai",
    api_key="ak_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)

message = client.messages.create(
    model="abliterated-model",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain abliteration in one paragraph."}
    ],
)

print(message.content[0].text)

Authentication

For raw HTTP, include credentials in the Authorization header as a Bearer token. For Anthropic SDKs, pass the same ak_ key via api_key; the SDK sends it as x-api-key.

curl -X POST https://api.abliteration.ai/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ak_YOUR_API_KEY" \
  -d '{
    "model": "abliterated-model",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Request body

The request body follows the Anthropic Messages format.

Field	Type	Required	Description
`model`	string	Yes	Model ID. Use `"abliterated-model"`.
`messages`	array	Yes	Non-empty array of message objects with `role` and `content`.
`max_tokens`	integer	No	Maximum number of tokens to generate.
`temperature`	float	No	Sampling temperature (0–2).
`stream`	boolean	No	Default `false`. Set to `true` for SSE streaming.
`system`	string \| array	No	System prompt prepended to the conversation.

Message format

Each message object has a role ("user" or "assistant") and content.

Content can be a plain string or an array of content blocks for multimodal inputs (text + images).

{
  "model": "abliterated-model",
  "max_tokens": 1024,
  "messages": [
    {
      "role": "user",
      "content": "What is abliteration?"
    },
    {
      "role": "assistant",
      "content": "Abliteration is a technique that removes refusal vectors from LLMs."
    },
    {
      "role": "user",
      "content": "Explain how it works step by step."
    }
  ]
}

Non-streaming response

When stream is false (default), the full response is returned as JSON.

The response includes credit metering fields specific to abliteration.ai: remaining_credits, estimated_credits_used, and estimated_cost_usd.

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "model": "abliterated-model",
  "content": [
    {
      "type": "text",
      "text": "Abliteration removes refusal vectors from language models..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 42,
    "output_tokens": 128
  },
  "remaining_credits": 487,
  "estimated_credits_used": 1,
  "estimated_cost_usd": 0.00085
}

Streaming response

Set stream: true to receive Server-Sent Events. Each event has an event: line and a data: line with JSON.

Events follow the Anthropic streaming protocol: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.

import anthropic

# Get yours at https://abliteration.ai/console/keys
# The Anthropic SDK sends api_key as x-api-key; abliteration.ai accepts ak_ keys there.
client = anthropic.Anthropic(
    base_url="https://api.abliteration.ai",
    api_key="ak_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)

with client.messages.stream(
    model="abliterated-model",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Token counting

POST /v1/messages/count_tokens returns an estimated input token count without generating a response. Useful for budget checks before sending expensive prompts.

// POST /v1/messages/count_tokens
// Request — same schema as /v1/messages
{
  "model": "abliterated-model",
  "messages": [
    {"role": "user", "content": "Count these tokens."}
  ]
}

// Response
{
  "input_tokens": 12
}

Rate limits

Limits are enforced per user in a rolling 60-second window. Exceeding the limit returns 429 with a Retry-After header.

Credit metering

Each call consumes credits based on total tokens (input + output). Credits are deducted after the response completes.

Developer tools

Machine-readable specs and ready-made collections for faster integration.