Anthropic Messages API (v1/messages)
Full reference for the Anthropic-compatible POST /v1/messages endpoint on abliteration.ai. Request schema, streaming, authentication, rate limits, and error codes.
abliteration.ai exposes POST /v1/messages, an Anthropic Messages API–compatible endpoint. Tools and SDKs that target the Anthropic API work with a base-URL switch.
Authenticate with a Bearer token (API key or JWT), send an Anthropic-format message array, and receive a structured response with token usage and credit metering.
Streaming is supported via Server-Sent Events (SSE). Set stream: true to receive delta chunks as they are generated.
Quick start
import anthropic
client = anthropic.Anthropic(
base_url="https://api.abliteration.ai",
api_key="ak_YOUR_API_KEY",
)
message = client.messages.create(
model="abliterated-model",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain abliteration in one paragraph."}
],
)
print(message.content[0].text)Service notes
- Pricing model: Usage-based pricing (~$5 per 1M tokens) billed on total tokens (input + output). See the API pricing page for current plans.
- Data retention: No prompt/output retention by default. Operational telemetry (token counts, timestamps, error codes) is retained for billing and reliability.
- Compatibility: OpenAI-style /v1/chat/completions request and response format with a base URL switch.
- Latency: Depends on model size, prompt length, and load. Streaming reduces time-to-first-token.
- Throughput: Team plans include priority throughput. Actual throughput varies with demand.
- Rate limits: Limits vary by plan and load. Handle 429s with backoff and respect any Retry-After header.
Authentication
#Include your credentials in the Authorization header as a Bearer token.
- API key — keys start with
ak_. Send asAuthorization: Bearer ak_.... - JWT — obtained from
POST /api/auth/login. Send asAuthorization: Bearer <jwt>. - Anonymous free tier — omit the token and set the
X-Free-Tier: 1header. Limited to 1 free request per device.
curl -X POST https://api.abliteration.ai/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ak_YOUR_API_KEY" \
-d '{
"model": "abliterated-model",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'Request body
#The request body follows the Anthropic Messages format.
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID. Use "abliterated-model". |
messages | array | Yes | Non-empty array of message objects with role and content. |
max_tokens | integer | No | Maximum number of tokens to generate. |
temperature | float | No | Sampling temperature (0–2). |
stream | boolean | No | Default false. Set to true for SSE streaming. |
system | string | array | No | System prompt prepended to the conversation. |
Message format
#Each message object has a role ("user" or "assistant") and content.
Content can be a plain string or an array of content blocks for multimodal inputs (text + images).
{
"model": "abliterated-model",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "What is abliteration?"
},
{
"role": "assistant",
"content": "Abliteration is a technique that removes refusal vectors from LLMs."
},
{
"role": "user",
"content": "Explain how it works step by step."
}
]
}Non-streaming response
#When stream is false (default), the full response is returned as JSON.
The response includes credit metering fields specific to abliteration.ai: remaining_credits, estimated_credits_used, and estimated_cost_usd.
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"model": "abliterated-model",
"content": [
{
"type": "text",
"text": "Abliteration removes refusal vectors from language models..."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 42,
"output_tokens": 128
},
"remaining_credits": 487,
"estimated_credits_used": 1,
"estimated_cost_usd": 0.00085
}Streaming response
#Set stream: true to receive Server-Sent Events. Each event has an event: line and a data: line with JSON.
Events follow the Anthropic streaming protocol: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.
import anthropic
client = anthropic.Anthropic(
base_url="https://api.abliteration.ai",
api_key="ak_YOUR_API_KEY",
)
with client.messages.stream(
model="abliterated-model",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)Token counting
#POST /v1/messages/count_tokens returns an estimated input token count without generating a response. Useful for budget checks before sending expensive prompts.
// POST /v1/messages/count_tokens
// Request — same schema as /v1/messages
{
"model": "abliterated-model",
"messages": [
{"role": "user", "content": "Count these tokens."}
]
}
// Response
{
"input_tokens": 12
}Rate limits
#Limits are enforced per user in a rolling 60-second window. Exceeding the limit returns 429 with a Retry-After header.
- API key callers: 120 requests per 60-second window.
- UI / JWT callers: 30 requests per 60-second window.
- Rate-limiter failures are fail-open — requests are allowed but the incident is logged.
Credit metering
#Each call consumes credits based on total tokens (input + output). Credits are deducted after the response completes.
- Minimum charge is 1 credit per call.
- Credits per call:
ceil(total_tokens / 500). - Pricing: ~$5 per 1 M tokens. See the pricing page for current plans.
- Anonymous free-tier calls do not consume credits.
- If credits are insufficient, the endpoint returns
402 Permission Error.
Developer tools
#Machine-readable specs and ready-made collections for faster integration.
- OpenAPI 3.0 spec — import into Swagger UI, Redocly, or any OpenAPI-compatible tool.
- Well-known OpenAPI discovery —
/.well-known/openapi.jsonfor automated tooling. - Postman collection & OpenAPI guide — pre-built Postman collection with environment variables.
- Claude Code integration — use Claude Code as an agentic coding tool with the abliteration.ai backend.
Common errors & fixes
- 400 Bad Request: Verify the model field is present, messages is a non-empty array, and the JSON is well-formed.
- 401 Unauthorized: Check that your API key or JWT is valid and sent as a Bearer token in the Authorization header.
- 402 Insufficient Credits: Your credit balance is zero. Purchase more credits from the pricing page or dashboard.
- 429 Rate Limit: You exceeded 120 req/min (API key) or 30 req/min (UI). Back off and use the Retry-After header.
- 502 Bad Gateway: The upstream model is temporarily unavailable. Retry after a short delay.
- 504 Gateway Timeout: The model did not respond in time. Try a shorter prompt or retry.