AI Gateway

Token quotas for LLM APIs (per-user, per-project)

AI gateways are expected to enforce quotas. Policy Gateway adds per-user and per-project token limits to LLM traffic.

Attach policy_user and policy_project_id to enforce budgets and keep audit trails clean.

Definition of Token quotas for LLM APIs (per-user, per-project)

Token quotas for LLM APIs are AI gateway controls that cap usage per user, per project, or per tenant to protect spend and prevent abuse.

Why Token quotas for LLM APIs (per-user, per-project) matters

Stop runaway usage before it explodes cost.
Isolate budgets by product, team, or customer.
Keep per-user audit trails for compliance reviews.

How it works

Create a project and scoped key per app or tenant.
Attach policy_user and policy_project_id on every request.
Define user_quota and project_quota in policy JSON.

Runnable cURL snippet

curl https://api.abliteration.ai/policy/chat/completions \
  -H "Authorization: Bearer $POLICY_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Policy-User: user-8932" \
  -H "X-Policy-Project: pro-plan" \
  -d '{
    "model": "abliterated-model",
    "messages": [{"role":"user","content":"Summarize the latest invoice."}],
    "policy_id": "quota-control"
  }'

Example policy JSON

{
  "policy_id": "quota-control",
  "name": "Quota control",
  "owner": "Platform team",
  "description": "Per-user and per-project token caps.",
  "rules": {
    "allowlist": ["billing", "support", "account"],
    "denylist": ["credential theft"],
    "flagged_categories": ["self-harm/intent", "sexual/minors"],
    "response_pattern": "refuse",
    "rewrite_instead_of_refuse": false,
    "redact": true,
    "reason_codes": ["ALLOW", "REFUSE", "REDACT"]
  },
  "org_controls": {
    "project_keys": true,
    "user_quotas": true,
    "audit_logs": true,
    "data_classification": "restricted",
    "user_quota": { "requests": 60, "tokens": 5000, "window": "daily" },
    "project_quota": { "requests": 30000, "tokens": 3000000, "window": "monthly" }
  },
  "rollout": {
    "shadow": { "enabled": false, "sample_percent": 0, "targets": [] },
    "canary": { "enabled": false, "sample_percent": 0, "targets": [] },
    "rollback_on_spike": true
  },
  "refusal_replacement": { "mode": "refuse", "escalation_path": "policy-review@company.com" }
}

Before and after

Before (no quotas)

User 8932 consumes 50k tokens in one hour with no limits.

After (Policy Gateway quotas)

decision: refuse
reason_code: USER_QUOTA_EXCEEDED
policy_user: user-8932
policy_project_id: pro-plan

Run the Policy Gateway simulator

Verify quota behavior and audit tags before enforcing limits.

FAQ

Frequently Asked Questions

Can I apply quotas without changing prompts?

Yes. Quotas use policy_user and policy_project_id metadata, not prompt content.

Do quotas work with project keys?

Project-scoped keys map requests to projects, which makes quota enforcement automatic.