# Free Developer Tool · No Signup

API Rate Limit Calculator

Convert RPS ↔ RPM ↔ RPH ↔ RPD, model token-bucket bursts, plan exponential-backoff retries, and size concurrency with Little's Law. All math runs locally in your browser.

✓ 100% client-side ✓ 4 modes in one tool ✓ No data leaves your browser
max tokens in bucket
sustained rate
your incoming traffic
Attempt Computed Delay With Jitter (range) Cumulative Time
used for headroom

What this tool helps you decide

API rate limits are easy to misjudge by an order of magnitude. A provider says "100 requests per minute" and an engineer mentally rounds that to "almost 2 per second — basically unlimited." A few weeks later, the system saturates at scale and on-call gets a page. The math isn't hard; it's just easy to skip. This calculator gives you four quick checks every API integration should pass before you ship it.

1. Rate unit conversion

If you ever need to translate between RPS, RPM, RPH, RPD, or per-month figures — either because the provider uses one unit and your dashboards use another, or because you're sizing a third-party budget — this is the boring math you don't want to redo every quarter. Just type the number.

2. Token bucket sizing

Token bucket is the dominant algorithm for production rate limiting in 2026. You set a sustained rate, a burst capacity, and the limiter quietly works in the background. The tool models what happens when arrival traffic exceeds the sustained rate: how long can you burst before the bucket empties, and what's the deficit growing per second once it does?

3. Exponential backoff retries

Most production APIs return 429 Too Many Requests or 503 intermittently. The standard retry pattern is exponential backoff with jitter — doubling delays with random variation. The tool generates the exact delay sequence so you can plan worst-case total wait time and choose sensible max-retry counts. If you're hitting an API with a 30-second hard timeout, retrying 12 times with backoff probably exceeds your budget.

4. Concurrency planning (Little's Law)

Little's Law — concurrency = arrival rate × service time — tells you the minimum number of simultaneous connections, threads, or worker slots your system needs to keep up. A common mistake: provisioning concurrency for mean latency but seeing queues build up because p99 is 4× the mean. The tool shows you both numbers so you can pick the right headroom.

For more developer tools

Browse the full JobsByCulture developer tools collection — cron expression builder, regex tester, JSON to TypeScript converter, semver calculator, and more. Or if you're hiring AI engineers who'll be working at this level of detail, check out live AI/ML engineer roles on the job board.

Frequently asked questions

How do I convert requests per second to requests per minute?+
Multiply RPS by 60. 100 RPS equals 6,000 RPM, 360,000 RPH, and 8,640,000 RPD. Use the Convert Rates tab above to convert any unit instantly across seconds, minutes, hours, days, and months.
What is a token bucket rate limiter?+
Token bucket is the most common API rate-limiting algorithm. A bucket holds a maximum number of tokens (the burst capacity); tokens refill at a steady rate (the sustained rate). Each request consumes one token. If the bucket is empty, requests are rejected or queued. This model allows short bursts above the sustained rate as long as average usage stays within budget.
What is exponential backoff for API retries?+
Exponential backoff doubles the wait time between retries: 1s, 2s, 4s, 8s, 16s. Production systems add jitter — random variation — to prevent thundering herd when many clients retry simultaneously. The formula is typically delay = min(maxDelay, baseDelay × 2^attempt) × random(0.5, 1.5). Full jitter (used in this tool) randomizes between 0 and the computed delay.
How do I calculate how many concurrent connections my API needs?+
Use Little's Law: concurrency = arrival rate × average response time. If you receive 100 RPS and each request takes 200ms (0.2s), you need at least 100 × 0.2 = 20 concurrent slots to keep up. Add 30–50% headroom for p95/p99 latency spikes. The Concurrency tab above computes this for you.
What rate limit do major AI APIs use in 2026?+
Rate limits vary by provider, tier, and model. Most LLM providers (OpenAI, Anthropic, Google) use a combination of RPM (requests per minute), ITPM (input tokens per minute), and OTPM (output tokens per minute), enforced via sliding-window or token-bucket algorithms with some burst capacity. Always check the provider's current published limits before designing retry logic — tiers and limits change frequently.

Looking for engineering teams that take API design seriously?

Browse AI/ML and backend roles tagged with real culture data on JobsByCulture.

Browse AI Engineer Jobs → More Dev Tools →