Skip to content

Rate limits

Phantom rate-limits API calls per token, with two independent buckets so a burst of reads can't starve writes:

BucketBurst (per minute)Sustained
read (GET, HEAD)120~10 / sec
write (POST, PATCH, PUT, DELETE)30~2 / sec

Plus a defence-in-depth per-IP cap of 30 unauthenticated requests / minute to prevent token-probing on a fresh IP.

How to read the headers

Every response carries:

RateLimit-Limit:     120      ← bucket size
RateLimit-Remaining: 117      ← what's left
RateLimit-Reset:     43       ← seconds until the bucket refills

When you hit the cap:

HTTP/1.1 429 Too Many Requests
Retry-After: 43
RateLimit-Limit: 120
RateLimit-Remaining: 0
RateLimit-Reset: 43
json
{
  "error": {
    "code":    "rate_limited",
    "message": "API rate limit exceeded. Try again in 43s.",
    "bucket":  "read"
  }
}

Retry-After and RateLimit-Reset are always in seconds, and always equal (they refer to the same value).

Sensible client behaviour

  • Always read RateLimit-Remaining. Throttle yourself preemptively if it's below ~10 so you don't get caught in a 429 mid-task.
  • Honour Retry-After. Don't busy-loop — wait the exact number of seconds it gave you.
  • Exponential backoff on 429s if you have a budget. A repeat 429 means another caller on the same token is also hitting the limit; back off.
  • Spread your work. A batch job that fires 100 writes in one second will trip the write limit; spread it over a minute.

Bypassing the limit

You can't. If you need higher throughput, mint multiple tokens and have your callers round-robin between them, or contact us so we can talk about the use case. We won't bump the limit silently — the cap is part of how we keep one customer's traffic from affecting another's.

What counts as a "request"

  • Every HTTP call to /api/v1/* consumes 1 from the appropriate bucket, including 4xx responses.
    • A response of 401 consumed 1 unauth request (per-IP bucket).
    • A response of 403 from RequireApiScope already consumed 1 from the read/write bucket — the token was valid, just not for this endpoint.
  • A successful Idempotent-Replay: true response still consumes 1 from the bucket — the replay logic runs after rate-limit accounting.

Burst handling

The bucket refills on a fixed 60-second window aligned to the wall clock — so at :00, :01, :02 of every minute, every token's budget resets. That means you can briefly consume close to 2× the per-minute rate by hitting :59→:01 of the same minute. We accept that — it's negligible at the budgets above and a much simpler mental model than a true token bucket.

Phantom is a product of Hydra Labs. The bot is run as a managed service; you do not need to host it yourself.