abliteration.ai

This site is crawlable without JavaScript, but the live console requires JavaScript.

Documentation | Definitions | Privacy Policy | Terms of Service

Reference

LLM API rate limits

Rate limits cap request and token throughput to keep the service reliable.

Handle 429 responses with backoff and spread retries across workers.

Definition of LLM API rate limits

A rate limit is a provider-enforced cap on requests or tokens within a time window.

Why LLM API rate limits matters

Protects overall reliability and availability.
Ensures fair use across teams and tenants.
Prevents sudden traffic spikes from overloading the API.

How it works

Track requests and token usage over time.
Honor Retry-After headers when you receive 429s.
Use queues and concurrency limits to smooth bursts.
Monitor headers and logs for remaining capacity when available.

Example response

HTTP/1.1 429 Too Many Requests
Retry-After: 2

FAQ

Frequently Asked Questions

Are rate limits per API key or per account?

Limits are typically enforced per account or API key. Check your dashboard for exact limits.

Should I retry immediately after a 429?

No. Wait for Retry-After or use exponential backoff with jitter before retrying.

How can I avoid hitting limits?

Batch low-priority work, reduce parallelism, and cache repeated responses.

Links to next steps