abliteration.ai - Uncensored LLM API Platform
Abliteration
DocsRoleplayMigrationDefinitionsPricing
Home/Docs/Rate limits and retries

Docs

Rate limits and retries

Rate limits protect reliability and vary by plan, model, and load.

Handle 429 responses with backoff and honor any Retry-After header.

Use request queues and concurrency limits to smooth traffic spikes.

Quick start

Example request
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

async function chatWithRetry(body, maxRetries = 5) {
  for (let attempt = 0; attempt <= maxRetries; attempt += 1) {
    const res = await fetch("https://api.abliteration.ai/v1/chat/completions", {
      method: "POST",
      headers: {
        "Authorization": "Bearer " + process.env.ABLIT_KEY,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });

    if (res.status !== 429) return res.json();

    const retryAfter = Number(res.headers.get("Retry-After"));
    const backoffSeconds = Number.isFinite(retryAfter)
      ? retryAfter
      : Math.min(2 ** attempt, 30);
    await sleep(backoffSeconds * 1000);
  }

  throw new Error("Rate limit exceeded");
}

const result = await chatWithRetry({
  model: "abliterated-model",
  messages: [{ role: "user", content: "Give me three bullet points." }],
});

Service notes

  • Pricing model: Usage-based pricing (~$5 per 1M tokens) billed on total tokens (input + output). See the API pricing page for current plans.
  • Data retention: No prompt/output retention by default. Operational telemetry (token counts, timestamps, error codes) is retained for billing and reliability.
  • Compatibility: OpenAI-style /v1/chat/completions request and response format with a base URL switch.
  • Latency: Depends on model size, prompt length, and load. Streaming reduces time-to-first-token.
  • Throughput: Team plans include priority throughput. Actual throughput varies with demand.
  • Rate limits: Limits vary by plan and load. Handle 429s with backoff and respect any Retry-After header.

How rate limits apply

Limits are usually enforced as per-minute budgets for requests and tokens. Exact limits can vary by plan or model.

  • Short requests still count toward request limits.
  • Long prompts and long outputs consume more token budget.
  • Parallel requests share the same limit window.

Headers to monitor

Check response headers for guidance on pacing. Some headers may be provider-specific.

  • Retry-After for recommended wait time after a 429.
  • x-ratelimit-* headers, if provided, for remaining capacity.
  • Request or trace ids for debugging with support.

Backoff and retry strategy

Use exponential backoff with jitter and cap maximum delays for a smoother recovery.

  • Respect Retry-After whenever it is present.
  • Spread retries across workers to avoid thundering herds.
  • Fail fast for non-429 errors and log them separately.

Concurrency control

Queues and concurrency limits keep your traffic within budget and improve success rates.

  • Limit concurrent requests per user or tenant.
  • Batch low-priority work and run it off-peak.
  • Use streaming for large responses to reduce user wait time.

Common errors & fixes

  • 401 Unauthorized: Check that your API key is set and sent as a Bearer token.
  • 404 Not Found: Make sure the base URL ends with /v1 and you call /chat/completions.
  • 400 Bad Request: Verify the model id and that messages are an array of { role, content } objects.
  • 429 Rate limit: Back off and retry. Use the Retry-After header for pacing.
  • 429 Rate limit: Back off and retry with jitter. Respect Retry-After if present.
  • 503 Service unavailable: Retry with exponential backoff and reduce concurrency temporarily.

Related links

  • OpenAI compatibility guide
  • Streaming chat completions
  • Vision and multimodal inputs
  • Rate limits definition
  • See API Pricing
  • View Uncensored Models
  • Rate limits
  • Privacy policy
DefinitionsDocumentationRun in PostmanPrivacy PolicyTerms of ServiceHugging Facehelp@abliteration.ai
FacebookX (Twitter)

© 2025 Social Keyboard, Inc. All rights reserved.