Docs
Rate limits and retries
Learn how rate limits work and how to implement backoff, retries, and concurrency control.
Updated 2025-12-30
Rate limits protect reliability and vary by plan, model, and load.
Handle 429 responses with backoff and honor any Retry-After header.
Use request queues and concurrency limits to smooth traffic spikes.
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
async function chatWithRetry(body, maxRetries = 5) {
for (let attempt = 0; attempt <= maxRetries; attempt += 1) {
const res = await fetch("https://api.abliteration.ai/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": "Bearer " + process.env.ABLIT_KEY,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
if (res.status !== 429) return res.json();
const retryAfter = Number(res.headers.get("Retry-After"));
const backoffSeconds = Number.isFinite(retryAfter)
? retryAfter
: Math.min(2 ** attempt, 30);
await sleep(backoffSeconds * 1000);
}
throw new Error("Rate limit exceeded");
}
const result = await chatWithRetry({
model: "abliterated-model",
messages: [{ role: "user", content: "Give me three bullet points." }],
});How rate limits apply
Limits are usually enforced as per-minute budgets for requests and tokens. Exact limits can vary by plan or model.
Headers to monitor
Check response headers for guidance on pacing. Some headers may be provider-specific.
Backoff and retry strategy
Use exponential backoff with jitter and cap maximum delays for a smoother recovery.
Concurrency control
Queues and concurrency limits keep your traffic within budget and improve success rates.