A trading bot that crashes into Kalshi's rate limits is a bot that misses fills, corrupts its own state, and risks an API key suspension. The good news: rate-limit problems are almost entirely engineering problems, and every one of them has a clean solution. This guide walks through exactly how Kalshi enforces limits, the data structures that keep you inside them, and the code patterns you can drop into a Python bot today.
How Kalshi Rate Limits Work
Kalshi's REST API divides endpoints into two broad tiers:
- Market data endpoints (GET /markets, GET /orderbook, GET /trades) — higher limits, read-only, less sensitive to abuse.
- Order management endpoints (POST /orders, DELETE /orders/{id}, GET /portfolio) — tighter limits because each call can trigger exchange-side state changes.
Limits are enforced at the API key level using a sliding-window or token-bucket mechanism server-side. That means all processes sharing one API key compete for the same quota — a detail that bites teams running multiple bots under a single account (more on that in the shared limiter section).
The exact numbers are documented in the Kalshi API reference and are subject to change — always treat the official docs as the source of truth. What doesn't change is the response contract: exceed the limit and you receive HTTP 429 Too Many Requests.
If you're still orienting yourself to the API surface, the Kalshi API tutorial covers authentication and endpoint structure before you worry about rate limits.
Reading the 429 Response
Before writing any throttle-protection logic, understand what Kalshi actually sends back. A 429 response looks like this:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 2
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1713820842
{"error": "rate_limit_exceeded", "message": "Too many requests. Retry after 2 seconds."}
The key headers:
| Header | Meaning |
|---|---|
Retry-After | Seconds to wait before retrying. Read this; don't guess. |
X-RateLimit-Limit | Total tokens/requests allowed in the current window. |
X-RateLimit-Remaining | Tokens left this window. Watch this proactively. |
X-RateLimit-Reset | Unix timestamp when the window resets. |
A naive bot ignores all of these and just retries in a tight loop, hammering the server and burning the entire remaining window. A well-built bot reads Retry-After, sleeps exactly that long, then resumes — and ideally never reaches 429 in the first place because it tracks X-RateLimit-Remaining on every successful response.
Token Bucket Implementation
The token bucket algorithm is the right mental model for this problem. You start with a bucket of N tokens. Each request consumes one token. Tokens refill at a steady rate (e.g., 10 per second). If the bucket is empty, the request waits until a token is available.
Here is a minimal, thread-safe Python implementation you can embed directly in your bot:
import time
import threading
class TokenBucket:
"""Thread-safe token bucket rate limiter."""
def __init__(self, rate: float, capacity: int):
"""
rate — tokens added per second (e.g. 10.0)
capacity — maximum tokens in the bucket
"""
self.rate = rate
self.capacity = capacity
self._tokens = float(capacity)
self._last_refill = time.monotonic()
self._lock = threading.Lock()
def _refill(self):
now = time.monotonic()
elapsed = now - self._last_refill
self._tokens = min(self.capacity, self._tokens + elapsed * self.rate)
self._last_refill = now
def acquire(self, tokens: int = 1, timeout: float = 10.0) -> bool:
"""Block until tokens are available or timeout expires."""
deadline = time.monotonic() + timeout
while True:
with self._lock:
self._refill()
if self._tokens >= tokens:
self._tokens -= tokens
return True
wait = tokens / self.rate
if time.monotonic() + wait > deadline:
return False
time.sleep(min(wait, 0.05))
# Usage
order_limiter = TokenBucket(rate=5.0, capacity=10) # 5 order ops/sec
market_limiter = TokenBucket(rate=20.0, capacity=40) # 20 market reads/sec
def place_order(payload):
if not order_limiter.acquire():
raise TimeoutError("Rate limiter timed out waiting for order slot")
return kalshi_client.post("/portfolio/orders", json=payload)
Use separate buckets for order endpoints and data endpoints — they have different quotas, and a data-polling spike shouldn't block order submission.
Exponential Backoff with Jitter
Even with a client-side bucket, you will eventually hit a 429 — maybe during a burst of signals at market open, or because a second process unexpectedly shared your key. Backoff handles that gracefully.
Plain exponential backoff (sleep 1s, 2s, 4s, 8s…) has a thundering-herd problem: if multiple bot instances all hit 429 at the same moment and all back off on the same schedule, they retry simultaneously and produce another burst. Adding jitter — a random fraction of the backoff interval — desynchronizes them.
import random
import time
def with_backoff(fn, max_retries: int = 6, base_delay: float = 1.0):
"""
Call fn(); on HTTP 429, back off with full jitter and retry.
Raises the last exception after max_retries exhausted.
"""
for attempt in range(max_retries):
response = fn()
if response.status_code != 429:
return response
retry_after = float(response.headers.get("Retry-After", base_delay))
cap = min(retry_after * (2 ** attempt), 60.0)
sleep_for = random.uniform(0, cap) # full jitter
print(f"429 received. Sleeping {sleep_for:.2f}s (attempt {attempt + 1})")
time.sleep(sleep_for)
response.raise_for_status() # propagate if exhausted
Six retries with full jitter and a 60-second cap means the worst-case cumulative wait is about two minutes — long enough for any transient rate-limit window to clear without hanging your bot indefinitely.
WebSocket vs. REST Polling
The single highest-leverage change most bots can make is replacing REST polling loops with a WebSocket subscription. If you're calling GET /markets/{id}/orderbook every 500ms to watch prices, that's 120 requests per minute just for one market. Watching ten markets? 1,200 requests per minute — a large fraction of your entire quota spent on data that could arrive via push.
Kalshi's WebSocket API streams order book updates, trade confirmations, and fill events in real time. Your bot subscribes once and receives incremental diffs as they happen, with zero polling overhead.
The tradeoff analysis and connection-management code are covered in depth in Kalshi WebSocket vs. REST API. The short version for rate-limit purposes: use WebSocket for anything you need continuously, use REST only for one-shot lookups or writes.
Switching from REST polling to WebSocket for market data typically reduces a bot's REST request volume by 80–90%, effectively eliminating the risk of hitting data-endpoint limits and freeing the entire quota for order operations.
Batching and Caching Requests
When you do need REST reads, batch them. Kalshi's list endpoints support query parameters that let you retrieve multiple markets in a single round trip:
# Inefficient — one request per market
for market_id in market_ids:
data = client.get(f"/markets/{market_id}")
# Efficient — one request for up to 200 markets
data = client.get("/markets", params={
"series_ticker": "INXD",
"status": "open",
"limit": 200
})
For data that doesn't change frequently — market metadata, series definitions, fee schedules — implement a simple TTL cache:
import time
from functools import wraps
_cache = {}
def ttl_cache(ttl_seconds: float):
def decorator(fn):
@wraps(fn)
def wrapper(*args):
key = (fn.__name__, args)
if key in _cache:
value, expires_at = _cache[key]
if time.monotonic() < expires_at:
return value
value = fn(*args)
_cache[key] = (value, time.monotonic() + ttl_seconds)
return value
return wrapper
return decorator
@ttl_cache(ttl_seconds=60)
def get_market_metadata(market_id: str):
return client.get(f"/markets/{market_id}").json()
Caching market metadata for 60 seconds can cut read requests by an order of magnitude in bots that re-fetch the same data on every loop iteration — a common pattern in early-stage bot code.
Multi-Bot Shared Rate Limiter
If you run several strategies simultaneously — say, a weather bot, a Fed rate bot, and a election bot — all under the same API key, their requests compete for the same quota. Two common solutions:
- Separate API keys per bot — the cleanest isolation. Each strategy gets its own credential and its own fresh quota. Requires Kalshi to support multiple API keys per account; check the current API docs.
- Shared in-process rate limiter — if all bots run in the same Python process (or connect to a shared Redis-backed limiter), a single
TokenBucketinstance can be shared across threads or async tasks.
For a Redis-backed distributed limiter (useful when bots run on separate machines), the sliding-window counter pattern works well:
import redis
import time
r = redis.Redis(host="localhost", port=6379)
def acquire_distributed(key: str, limit: int, window_secs: int) -> bool:
now_ms = int(time.time() * 1000)
window_start = now_ms - window_secs * 1000
pipe = r.pipeline()
pipe.zremrangebyscore(key, 0, window_start)
pipe.zadd(key, {str(now_ms): now_ms})
pipe.zcard(key)
pipe.expire(key, window_secs + 1)
_, _, count, _ = pipe.execute()
return count <= limit
This uses a Redis sorted set where each element is a request timestamp. The count of elements in the sliding window is your current usage. If count > limit, the caller backs off before issuing the request.
For more on deploying bots reliably across multiple machines, see the production deployment guide and the bot hosting guide.
Monitoring Your Rate-Limit Headroom
Proactive monitoring beats reactive backoff. Parse the rate-limit headers on every successful response and emit a metric:
import logging
logger = logging.getLogger("kalshi.rate_limit")
def parse_rate_limit_headers(response):
remaining = response.headers.get("X-RateLimit-Remaining")
limit = response.headers.get("X-RateLimit-Limit")
if remaining is not None and limit is not None:
pct_used = (1 - int(remaining) / int(limit)) * 100
if pct_used > 80:
logger.warning(
"Rate limit at %.0f%% capacity (remaining=%s, limit=%s)",
pct_used, remaining, limit
)
# Emit to your metrics system (Prometheus, Datadog, etc.)
metrics.gauge("kalshi.rate_limit.used_pct", pct_used)
Alerting when you've consumed more than 80% of the window gives you time to throttle voluntarily before hitting 429. Connect this to the alerting setup described in bot monitoring and alerting to get a Slack or PagerDuty notification if the metric stays elevated.
Also track 429 rate as its own metric: kalshi.rate_limit.throttled_count. A sudden spike in 429s on an otherwise stable bot often signals a new code path generating unexpected request bursts — catching it early prevents a spiral into key suspension.
Putting It All Together
Here is the minimal rate-limit stack every production Kalshi bot should have, in priority order:
- Switch to WebSocket for streaming data. This alone eliminates most rate-limit risk. See WebSocket vs. REST for implementation details.
- Add per-endpoint token buckets at or below the documented limit. Set order-endpoint buckets conservatively — leave 20% headroom.
- Wrap all REST calls in a backoff decorator that reads
Retry-Afterand uses full jitter. Never retry in a tight loop. - Batch list endpoint calls and cache static metadata with a TTL appropriate to how often that data changes.
- Emit rate-limit headroom metrics and alert before you hit the wall, not after.
- Isolate API keys per strategy if you run multiple bots, or use a shared limiter if they're co-located.
None of this is exotic engineering. Token buckets and exponential backoff are standard patterns from any distributed systems textbook. What makes them feel hard in trading contexts is the time pressure — a missed order during a fast-moving market feels costly. The answer is to build the rate-limit layer once, test it under synthetic load, and then forget about it while the bot handles real markets.
Rate-limit handling is one layer of a production bot. The full architecture — including order management, risk controls, and position sizing — is covered in the complete guide to Kalshi trading bots. If you're earlier in the build process, the Python bot tutorial walks through the REST client setup that underpins everything described here.
Frequently Asked Questions
Quick answers to common questions about Handling Kalshi API Rate Limits Without Getting Your Bot Throttled.
What are Kalshi's current API rate limits?
Kalshi enforces per-endpoint rate limits on its REST API, typically in the range of 10–30 requests per second depending on the endpoint tier. Order submission endpoints are more tightly capped than read-only market data endpoints. Always check the official Kalshi API documentation for the latest numbers, as limits can change.
What HTTP status code does Kalshi return when you're throttled?
Kalshi returns HTTP 429 Too Many Requests when a client exceeds its rate limit. The response headers include a Retry-After value (in seconds) indicating how long you must wait before retrying. Your bot should read this header rather than using a hard-coded delay.
Does using the WebSocket API help avoid rate limits?
Yes — streaming market data over WebSocket removes the need for repeated REST polling and dramatically cuts your request volume. For any data you need continuously (order book updates, market prices), the WebSocket connection is both more efficient and far less likely to trigger throttling.
Can rate limits get my Kalshi API key suspended?
Persistent or egregious rate-limit violations can result in temporary or permanent API key suspension. A well-implemented exponential backoff with jitter and staying within documented limits is the safest approach; a 429 itself is a warning, not an immediate ban.
Does running multiple bots under the same API key share the rate limit?
Yes. Rate limits are enforced at the API key level, not the process or IP level. If you run several bot instances using the same credentials, their requests are pooled against the same quota. Use separate API keys per bot or implement a shared rate-limiter middleware for multi-bot setups.
What is a token bucket and why is it the right model for Kalshi rate limiting?
A token bucket is an algorithm that grants a fixed number of 'tokens' per time window; each request consumes one token, and tokens refill at a steady rate. It smooths burst traffic without hard-stopping at the window boundary, which mirrors how most API gateways — including Kalshi's — actually enforce limits internally.
How do I batch Kalshi API calls to stay within rate limits?
Instead of issuing one REST call per market you care about, use the list endpoints (e.g., GET /markets) with filters to retrieve multiple markets in a single request. For order management, queue pending actions and flush them in a controlled loop rather than firing them immediately on each signal.
Try the live demo — watch Claude build your trading bot
Describe a trade in plain English and the demo builds it in front of you, wired to live Kalshi data. Free — no email needed to try it.
Drop your email and we'll save the bots you build — no spam. Prefer to watch first? Free live webinar June 22 · 6 PM PT — register here.