Compatibility
OpenAI-compatible endpoints supported
Use the OpenAPI spec to confirm live endpoint support.
OpenAPI spec: https://api.abliteration.ai/openapi.json.
Quick start
Base URL
OpenAPI spec
https://api.abliteration.ai/openapi.json
Free preview for 5 messages. Sign up to continue.
Service notes
- Pricing model: Usage-based pricing (~$5 per 1M tokens) billed on total tokens (input + output). See the API pricing page for current plans.
- Data retention: No prompt/output retention by default. Operational telemetry (token counts, timestamps, error codes) is retained for billing and reliability.
- Compatibility: OpenAI-style /v1/chat/completions request and response format with a base URL switch.
- Latency: Depends on model size, prompt length, and load. Streaming reduces time-to-first-token.
- Throughput: Team plans include priority throughput. Actual throughput varies with demand.
- Rate limits: Limits vary by plan and load. Handle 429s with backoff and respect any Retry-After header.
Endpoint compatibility matrix
| Endpoint / Feature | Status | Notes |
|---|---|---|
| Chat completions | Documented | /v1/chat/completions |
| Streaming | Documented | Server-sent events with data: lines |
| Vision (multimodal) | Documented | Image inputs in message content arrays |
| Embeddings | Check OpenAPI spec | Confirm in the spec before use |
| Responses API | Check OpenAPI spec | Confirm in the spec before use |
| Tool calling | Check OpenAPI spec | Validate schema compatibility |
Mini playground
Response will appear here.
Expected output should include endpoints verified.
Test vector
Expected output should include endpoints verified.
Request payload
{
"model": "abliterated-model",
"messages": [
{
"role": "user",
"content": "Reply with: endpoints verified."
}
],
"temperature": 0.2,
"stream": false
}Common errors & fixes
- 401 Unauthorized: Check that your API key is set and sent as a Bearer token.
- 404 Not Found: Make sure the base URL ends with /v1 and you call /chat/completions.
- 400 Bad Request: Verify the model id and that messages are an array of { role, content } objects.
- 429 Rate limit: Back off and retry. Use the Retry-After header for pacing.