AI GatewayReviewed 2026-01-13

Token quotas for LLM APIs (per-user, per-project)

AI gateway token quotas for LLM APIs using per-user and per-project limits with audit-ready enforcement.

AI gateways are expected to enforce quotas. Policy Gateway adds per-user and per-project token limits to LLM traffic.

Attach policy_user and policy_project_id to enforce budgets and keep audit trails clean.

Definition

Token quotas for LLM APIs (per-user, per-project)

Token quotas for LLM APIs are AI gateway controls that cap usage per user, per project, or per tenant to protect spend and prevent abuse.

Why it matters
  • Stop runaway usage before it explodes cost.
  • Isolate budgets by product, team, or customer.
  • Keep per-user audit trails for compliance reviews.
How it works
  1. 01Create a project and scoped key per app or tenant.
  2. 02Attach policy_user and policy_project_id on every request.
  3. 03Define user_quota and project_quota in policy JSON.
Runnable cURL snippet
curl https://api.abliteration.ai/policy/chat/completions \
  -H "Authorization: Bearer $POLICY_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Policy-User: user-8932" \
  -H "X-Policy-Project: pro-plan" \
  -d '{
    "model": "abliterated-model",
    "messages": [{"role":"user","content":"Summarize the latest invoice."}],
    "policy_id": "quota-control"
  }'
Example policy JSON
{
  "policy_id": "quota-control",
  "name": "Quota control",
  "owner": "Platform team",
  "description": "Per-user and per-project token caps.",
  "rules": {
    "allowlist": ["billing", "support", "account"],
    "denylist": ["credential theft"],
    "flagged_categories": ["self-harm/intent", "sexual/minors"],
    "response_pattern": "refuse",
    "rewrite_instead_of_refuse": false,
    "redact": true,
    "reason_codes": ["ALLOW", "REFUSE", "REDACT"]
  },
  "org_controls": {
    "project_keys": true,
    "user_quotas": true,
    "audit_logs": true,
    "data_classification": "restricted",
    "user_quota": { "requests": 60, "tokens": 5000, "window": "daily" },
    "project_quota": { "requests": 30000, "tokens": 3000000, "window": "monthly" }
  },
  "rollout": {
    "shadow": { "enabled": false, "sample_percent": 0, "targets": [] },
    "canary": { "enabled": false, "sample_percent": 0, "targets": [] },
    "rollback_on_spike": true
  },
  "refusal_replacement": { "mode": "refuse", "escalation_path": "policy-review@company.com" }
}
Before and after
Before (no quotas)
User 8932 consumes 50k tokens in one hour with no limits.
After (Policy Gateway quotas)
decision: refuse
reason_code: USER_QUOTA_EXCEEDED
policy_user: user-8932
policy_project_id: pro-plan

Run the Policy Gateway simulator

Verify quota behavior and audit tags before enforcing limits.

Run a simulation
FAQ

Frequently asked questions.

Can I apply quotas without changing prompts?

Yes. Quotas use policy_user and policy_project_id metadata, not prompt content.

Do quotas work with project keys?

Project-scoped keys map requests to projects, which makes quota enforcement automatic.