Billing, quotas and PAYG
Every generation request passes through quota, PAYG and rate-limit checks before model routing starts.
Request lifecycle
- Authenticate the API key
The request is attached to a customer account and API key.
- Resolve the public model
The public
modelID is checked against enabled catalog entries and plan access. - Decide billing
Subscription quota is reserved first. If unavailable, PAYG can reserve estimated wallet balance when enabled.
- Call the model route
The request is routed only after billing and rate-limit checks pass.
- Settle usage
Final usage settles quota or wallet reservations. Failed requests release reservations.
Quota and PAYG rules
- Subscription plans use rolling
5hand7dquota windows. - Each model can consume a different request multiplier per API call.
- PAYG fallback requires wallet credit and account-level PAYG enabled.
- Streaming PAYG requires final usage data to settle correctly.
Rate limits
- Account-level and API-key-level limits are enforced per minute.
- A
429response means the request was blocked before generation. - Create separate keys per integration to make rate-limit diagnostics easier.