Rate Limiting in Express: Stop the Stampede Before It Tramples Your Server 🦬

Imagine you open a lemonade stand. Business is great. Then one guy shows up and orders 50,000 cups in 10 seconds. Your lemons are gone, your arms are broken, and the actual paying customers are standing in a pile of citrus chaos.

That's your API without rate limiting.

Rate limiting is one of those things developers skip because it "seems like a later problem" — until it isn't. One misguided bot, a viral Reddit post, or a simple retry loop gone wrong can take a perfectly healthy Express server to its knees. Let's talk about how to implement rate limiting properly, and why it should be a first-class citizen in every backend project.

Why Rate Limiting Matters (Beyond the Obvious)

Sure, rate limiting protects against DDoS attacks and credential stuffing. But the lesser-sung benefits are just as important:

Cost control — If your API calls a paid third-party service, unlimited requests = unlimited bills.
Fairness — One power user shouldn't degrade the experience for everyone else.
Abuse prevention — Scrapers, spammers, and bots love unprotected endpoints.
Graceful degradation — When traffic spikes, you want to shed load, not crash.

Think of rate limiting as the bouncer at the club. They don't hate you — they're just making sure the dance floor doesn't collapse.

The Basics: `express-rate-limit`

The easiest way to add rate limiting to an Express app is the battle-tested express-rate-limit package. It's middleware-based, highly configurable, and takes about five minutes to set up.

npm install express-rate-limit

Here's a simple global rate limiter:

import express from 'express';
import rateLimit from 'express-rate-limit';

const app = express();

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100,                  // max 100 requests per window per IP
  standardHeaders: true,     // Return rate limit info in `RateLimit-*` headers
  legacyHeaders: false,      // Disable `X-RateLimit-*` headers
  message: {
    status: 429,
    error: 'Too many requests. Take a breath and try again shortly.',
  },
});

app.use(limiter);

That's it. Every IP is now capped at 100 requests per 15-minute window. Anyone who blows past that gets a 429 Too Many Requests with a polite-but-firm message.

The standardHeaders: true option is worth highlighting — it sends RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset headers back to the client, which well-behaved clients can use to back off gracefully. It's the difference between a bouncer who says "wait outside" and one who just silently stares until you leave confused.

Going Granular: Per-Route Limiting

Global limits are a great safety net, but you almost always want stricter limits on sensitive routes. Your /login endpoint should not be treated the same as /api/articles.

// Strict limiter for auth routes — brute force protection
const authLimiter = rateLimit({
  windowMs: 10 * 60 * 1000, // 10 minutes
  max: 10,                   // only 10 attempts per window
  message: {
    status: 429,
    error: 'Too many login attempts. Please wait before trying again.',
  },
});

// Generous limiter for read-heavy public API
const apiLimiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 60,             // 60 requests per minute (1 per second average)
});

app.post('/api/auth/login', authLimiter, loginHandler);
app.post('/api/auth/register', authLimiter, registerHandler);
app.use('/api/', apiLimiter);

This is the tiered bouncer system. The VIP room (auth) has a very strict guest list. The main floor (general API) is more relaxed, but still controlled.

One important note: if your Express app is behind a proxy (nginx, a load balancer, Cloudflare), you need to tell Express to trust the X-Forwarded-For header so rate limiting is keyed to the real client IP, not the proxy's IP:

app.set('trust proxy', 1); // trust first proxy

Without this, every single request looks like it's coming from the same IP — your proxy — and you'll rate-limit yourself almost immediately. Ask me how I know.

Leveling Up: Redis-Backed Rate Limiting

The default in-memory store works fine for a single server instance. But if you're running multiple Node processes or a horizontally scaled deployment, each instance has its own in-memory counter. A user can hammer 10 different instances and bypass your limits entirely.

The fix: a shared, external store — Redis is the standard choice.

npm install rate-limit-redis ioredis

import { RateLimitRedisStore } from 'rate-limit-redis';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 100,
  standardHeaders: true,
  legacyHeaders: false,
  store: new RateLimitRedisStore({
    sendCommand: (...args) => redis.call(...args),
  }),
});

Now all your instances share a single source of truth. User hits instance A? That counts. They switch to instance B? Redis already knows. The bouncer at every door is reading from the same clipboard.

Common Pitfalls to Avoid

Don't just return 429 and ghost the user. Always include a Retry-After header or a message telling them when they can try again. standardHeaders: true handles this automatically.

Don't rate limit health check endpoints. Your load balancer pings /health every few seconds. A strict global limit will mark your service as unhealthy for no reason.

// Apply limiter everywhere EXCEPT health checks
app.use('/health', healthHandler); // no limiter
app.use(limiter);                  // limiter for everything else

Don't forget to test it. Spin up a quick script that fires 200 requests in a burst and verify you get 429s after the threshold. Rate limiting that silently fails is worse than having none.

Wrapping Up

Rate limiting isn't glamorous, but it's one of those foundations that quietly keeps everything else working. It's cheap to implement, pays off enormously when things go sideways, and signals to your users (and attackers) that you've actually thought about your API's behavior under pressure.

Here's the mental model to keep: your server is a shared resource. Rate limiting is how you protect that resource from any single actor — whether malicious, accidental, or just overly enthusiastic.

Start with a sensible global limit. Add stricter per-route limits on auth and sensitive endpoints. Move to Redis when you scale out. Then sleep soundly knowing that when the stampede comes, your server has a sturdy fence.

Got questions about rate limiting strategies or scaling patterns? Drop them in the comments or reach out — always happy to talk backend.

Rate Limiting in Express: Stop the Stampede Before It Tramples Your Server 🦬

Rate Limiting in Express: Stop the Stampede Before It Tramples Your Server 🦬

Why Rate Limiting Matters (Beyond the Obvious)

The Basics: express-rate-limit

Going Granular: Per-Route Limiting

Leveling Up: Redis-Backed Rate Limiting

Common Pitfalls to Avoid

Wrapping Up

The Basics: `express-rate-limit`