AI GatewayReviewed 2026-01-13
Token quotas for LLM APIs (per-user, per-project)
AI gateway token quotas for LLM APIs using per-user and per-project limits with audit-ready enforcement.
AI gateways are expected to enforce quotas. Policy Gateway adds per-user and per-project token limits to LLM traffic.
Attach policy_user and policy_project_id to enforce budgets and keep audit trails clean.
Definition
Token quotas for LLM APIs (per-user, per-project)
Token quotas for LLM APIs are AI gateway controls that cap usage per user, per project, or per tenant to protect spend and prevent abuse.
Why it matters
- Stop runaway usage before it explodes cost.
- Isolate budgets by product, team, or customer.
- Keep per-user audit trails for compliance reviews.
How it works
- 01Create a project and scoped key per app or tenant.
- 02Attach policy_user and policy_project_id on every request.
- 03Define user_quota and project_quota in policy JSON.
Runnable cURL snippet
curl https://api.abliteration.ai/policy/chat/completions \
-H "Authorization: Bearer $POLICY_KEY" \
-H "Content-Type: application/json" \
-H "X-Policy-User: user-8932" \
-H "X-Policy-Project: pro-plan" \
-d '{
"model": "abliterated-model",
"messages": [{"role":"user","content":"Summarize the latest invoice."}],
"policy_id": "quota-control"
}'Example policy JSON
{
"policy_id": "quota-control",
"name": "Quota control",
"owner": "Platform team",
"description": "Per-user and per-project token caps.",
"rules": {
"allowlist": ["billing", "support", "account"],
"denylist": ["credential theft"],
"flagged_categories": ["self-harm/intent", "sexual/minors"],
"response_pattern": "refuse",
"rewrite_instead_of_refuse": false,
"redact": true,
"reason_codes": ["ALLOW", "REFUSE", "REDACT"]
},
"org_controls": {
"project_keys": true,
"user_quotas": true,
"audit_logs": true,
"data_classification": "restricted",
"user_quota": { "requests": 60, "tokens": 5000, "window": "daily" },
"project_quota": { "requests": 30000, "tokens": 3000000, "window": "monthly" }
},
"rollout": {
"shadow": { "enabled": false, "sample_percent": 0, "targets": [] },
"canary": { "enabled": false, "sample_percent": 0, "targets": [] },
"rollback_on_spike": true
},
"refusal_replacement": { "mode": "refuse", "escalation_path": "policy-review@company.com" }
}Before and after
Before (no quotas)
User 8932 consumes 50k tokens in one hour with no limits.
After (Policy Gateway quotas)
decision: refuse reason_code: USER_QUOTA_EXCEEDED policy_user: user-8932 policy_project_id: pro-plan
Run the Policy Gateway simulator
Verify quota behavior and audit tags before enforcing limits.
Run a simulationFAQ
Frequently asked questions.
Can I apply quotas without changing prompts?
Yes. Quotas use policy_user and policy_project_id metadata, not prompt content.
Do quotas work with project keys?
Project-scoped keys map requests to projects, which makes quota enforcement automatic.