API Rate Limiting: Or How I Learned to Stop Worrying and Love the 429 ๐ฆ
API Rate Limiting: Or How I Learned to Stop Worrying and Love the 429 ๐ฆ
So you built an API. Congrats! ๐ Then someone sent 50,000 requests in 10 seconds and your server burst into flames. Welcome to the club! ๐ฅ
Let me guess - your first thought was "Who would DO that?" followed by "How do I stop them?" Well, friend, you need rate limiting. Think of it as a bouncer for your API - no VIP list, no entry.
What Even Is Rate Limiting? ๐ค
Rate limiting = Controlling how many requests a user can make in a given time period.
Think of it like a nightclub:
- No rate limiting: Everyone rushes in at once, fire hazard, chaos
- With rate limiting: "One at a time, folks! You get 100 requests per minute, you get 100 requests per minute..."
It's not being mean - it's being smart. Your server has limits. Physics is real. Infinity doesn't work in production.
The Wake-Up Call ๐
True story: I once deployed an API without rate limiting. Felt like a cowboy. ๐ค
What happened: Someone's script had a bug and sent 100,000 requests in a loop. Our AWS bill looked like a phone number. Our database cried. Our monitoring system sent so many alerts it crashed itself.
The lesson: Always assume someone will hammer your API - on purpose OR by accident.
Why Your API NEEDS Rate Limiting ๐ก๏ธ
1. Prevent Abuse
- Malicious actors trying to DDoS you
- Competitors scraping your data
- Script kiddies testing their new "hacking tool"
2. Save Money
- Cloud costs scale with requests
- Database queries cost money
- CPU time isn't free
3. Fair Resource Distribution
- One user shouldn't hog all your bandwidth
- Everyone deserves a chance to use your API
4. Prevent Accidental Self-DDoS
- Buggy client code in an infinite loop
- Retry logic gone wrong
- That one intern who tested in production (we've all been there)
Rate Limiting Strategies ๐ฏ
1. Fixed Window (The Simple One)
// "You get 100 requests per hour"
const rateLimit = require('express-rate-limit');
const limiter = rateLimit({
windowMs: 60 * 60 * 1000, // 1 hour
max: 100, // 100 requests per window
message: 'Too many requests, please try again later.',
standardHeaders: true, // Return rate limit info in headers
legacyHeaders: false
});
app.use('/api/', limiter);
Pros: Dead simple, easy to understand Cons: Can be gamed at window boundaries (burst 200 requests across two windows)
Real Talk: For most small-medium apps, this is totally fine!
2. Sliding Window (The Smart One)
// Uses Redis for distributed rate limiting
const Redis = require('ioredis');
const redis = new Redis();
async function slidingWindowRateLimit(userId, maxRequests, windowSeconds) {
const now = Date.now();
const windowStart = now - (windowSeconds * 1000);
const key = `rate_limit:${userId}`;
// Remove old entries outside the window
await redis.zremrangebyscore(key, 0, windowStart);
// Count requests in current window
const requestCount = await redis.zcard(key);
if (requestCount >= maxRequests) {
return { allowed: false, remaining: 0 };
}
// Add current request
await redis.zadd(key, now, `${now}-${Math.random()}`);
await redis.expire(key, windowSeconds);
return {
allowed: true,
remaining: maxRequests - requestCount - 1
};
}
Pros: No burst exploits, smooth distribution Cons: Slightly more complex, needs Redis/database
When to use: When you're serious about fair limits (APIs with pricing tiers, etc.)
3. Token Bucket (The Fancy One)
Think of it like a bucket that fills with tokens over time. Each request costs one token.
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity; // Max tokens
this.tokens = capacity; // Current tokens
this.refillRate = refillRate; // Tokens per second
this.lastRefill = Date.now();
}
refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
const tokensToAdd = elapsed * this.refillRate;
this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
this.lastRefill = now;
}
consume(tokens = 1) {
this.refill();
if (this.tokens >= tokens) {
this.tokens -= tokens;
return { allowed: true, remaining: Math.floor(this.tokens) };
}
return { allowed: false, remaining: 0 };
}
}
// Usage
const bucket = new TokenBucket(100, 10); // 100 max, refill 10/sec
app.use((req, res, next) => {
const result = bucket.consume(1);
if (!result.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: Math.ceil((1 - bucket.tokens) / bucket.refillRate)
});
}
next();
});
Pros: Allows bursts while maintaining average rate, very flexible Cons: More complex to implement and explain to users
When to use: High-traffic APIs where burst tolerance matters (search, video streaming, etc.)
The Right Way to Return Rate Limit Info ๐
Always return these headers:
res.set({
'X-RateLimit-Limit': 100, // Max requests per window
'X-RateLimit-Remaining': 75, // Requests left
'X-RateLimit-Reset': 1643723400, // When limit resets (Unix timestamp)
'Retry-After': 3600 // Seconds to wait (when 429)
});
Why? Your API consumers can build smart retry logic!
// Client-side example
async function smartFetch(url) {
const response = await fetch(url);
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
console.log(`Rate limited! Waiting ${retryAfter} seconds...`);
await sleep(retryAfter * 1000);
return smartFetch(url); // Retry
}
return response;
}
Pro Tip: GitHub, Twitter, Stripe - all the big APIs do this. Follow the leaders!
Common Mistakes (Don't Be That Developer) ๐
Mistake #1: Rate Limiting by IP Only
// DON'T DO THIS (only)
const limiter = rateLimit({
keyGenerator: (req) => req.ip // Everyone behind NAT shares a limit!
});
The problem:
- Entire offices share one IP
- Mobile carriers use NAT
- VPNs/proxies make this useless
Better approach:
// Combine IP + API key/user ID
const limiter = rateLimit({
keyGenerator: (req) => {
// Authenticated users: rate limit by user ID
if (req.user) {
return `user:${req.user.id}`;
}
// Anonymous: fall back to IP (but be generous)
return `ip:${req.ip}`;
}
});
Mistake #2: Not Handling Rate Limit Errors
// BAD: Silent failure
if (rateLimitExceeded) {
return res.status(500).json({ error: 'Internal server error' });
}
// GOOD: Clear, actionable error
if (rateLimitExceeded) {
return res.status(429).json({
error: 'Rate limit exceeded',
message: 'You have exceeded the 100 requests per hour limit.',
retryAfter: 3600,
documentation: 'https://docs.yourapi.com/rate-limits'
});
}
Translation: Help developers help themselves!
Mistake #3: Same Limit for Everything
// NOT ALL ENDPOINTS ARE EQUAL!
// Heavy operation: strict limit
app.post('/api/video/encode',
rateLimit({ windowMs: 60000, max: 5 }),
encodeVideo
);
// Light read: generous limit
app.get('/api/user/profile',
rateLimit({ windowMs: 60000, max: 1000 }),
getProfile
);
// Public data: very generous
app.get('/api/public/stats',
rateLimit({ windowMs: 60000, max: 10000 }),
getStats
);
Golden Rule: Expensive operations get tight limits, cheap operations get loose limits.
Mistake #4: No Escape Hatch for Legitimate High Usage
// Allow premium users higher limits
const getDynamicLimit = (req) => {
if (req.user?.subscription === 'premium') {
return 10000; // Premium: 10k/hour
}
if (req.user?.subscription === 'pro') {
return 1000; // Pro: 1k/hour
}
return 100; // Free: 100/hour
};
const limiter = rateLimit({
max: (req) => getDynamicLimit(req),
windowMs: 60 * 60 * 1000
});
Why it matters: You want to be strict by default but flexible for paying customers!
Distributed Rate Limiting (When You Have Multiple Servers) ๐
The problem: Each server has its own memory. User hits different servers, bypasses limit!
The solution: Shared state with Redis
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');
const limiter = rateLimit({
store: new RedisStore({
client: new Redis({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT
}),
prefix: 'rl:' // Rate limit prefix
}),
windowMs: 60 * 1000,
max: 100
});
Translation: All servers check the same Redis instance. No cheating!
Advanced: Adaptive Rate Limiting ๐ง
The idea: Adjust limits based on server health
const os = require('os');
function getAdaptiveLimit() {
const cpuLoad = os.loadavg()[0]; // 1-min average
const memUsage = (1 - (os.freemem() / os.totalmem())) * 100;
// Server under stress? Reduce limits!
if (cpuLoad > 0.8 || memUsage > 85) {
return 50; // Emergency mode
}
if (cpuLoad > 0.5 || memUsage > 70) {
return 75; // Caution mode
}
return 100; // Normal mode
}
const limiter = rateLimit({
max: (req) => getAdaptiveLimit(),
windowMs: 60 * 1000
});
Real Talk: This is overkill for most apps. But cool to know about!
Testing Your Rate Limits ๐งช
Don't wait for production to test!
# Simple test with curl
for i in {1..150}; do
curl -w "\n%{http_code}" http://localhost:3000/api/test
done
# Better test with Apache Bench
ab -n 1000 -c 10 http://localhost:3000/api/test
# Professional test with k6
k6 run --vus 10 --duration 30s rate-limit-test.js
What to check:
- Do you get 429 after hitting the limit? โ
- Are the headers correct? โ
- Does it reset properly? โ
- Can legit users still get through? โ
Your Rate Limiting Checklist โ
Before you deploy:
- Rate limiting enabled on ALL public endpoints
- Different limits for different endpoint costs
- Using distributed state (Redis) if you have multiple servers
- Returning proper 429 status + headers
- Rate limit key includes user ID (not just IP)
- Higher limits for authenticated/premium users
- Clear error messages with retry info
- Monitoring/alerts for rate limit hits
- Documentation explaining your limits
- Tested under load!
Quick Wins (Implement These Today!) ๐โโ๏ธ
Option 1: Express (Node.js)
npm install express-rate-limit
Option 2: Laravel (PHP)
Route::middleware('throttle:60,1')->group(function () {
// 60 requests per minute
});
Option 3: Django (Python)
pip install django-ratelimit
Translation: Every major framework has this built-in or as a simple package. No excuses!
Real Talk ๐ฌ
Q: "What's a good default limit?"
A: Start with 100 requests per hour for anonymous, 1,000 for authenticated. Adjust based on actual usage patterns.
Q: "Should I rate limit internal APIs?"
A: YES! Buggy internal code can DDoS you just as well. Maybe be more generous, but still limit.
Q: "What about webhooks?"
A: Separate limits! Webhooks are often spikier. Consider using queues instead.
Q: "Won't this hurt user experience?"
A: Not if you implement it right! Most users never hit the limit. Those who do probably shouldn't be hitting it.
The Bottom Line
Rate limiting isn't optional anymore. It's like wearing a seatbelt - you don't notice it until you need it, and then you're REALLY glad it's there.
The essentials:
- Always rate limit public endpoints (start with express-rate-limit or your framework's built-in)
- Return clear error messages (429 status + Retry-After header)
- Different limits for different endpoints (expensive ops get tight limits)
- Use Redis for distributed systems (multiple servers need shared state)
- Monitor your limits (adjust based on real usage)
Think of it as proactive defense - stop problems before they start! Your future self (and your AWS bill) will thank you. ๐ธ
Got rate limit war stories? Share them on LinkedIn - I collect them!
Want to see my projects? Check out my GitHub - all properly rate-limited, of course! ๐
P.S. - If you're reading this and your API has no rate limiting, go fix that right now. I'll wait. ๐ฆโจ