0x55aa
← Back to Blog

Load Balancing: Stop Overloading One Server While Others Sip Coffee ☕⚖️

17 min read

Load Balancing: Stop Overloading One Server While Others Sip Coffee ☕⚖️

Real talk: The first time our e-commerce backend got featured on Product Hunt, we had 5 EC2 instances ready to handle the traffic. I watched the dashboard with confidence. Then I saw it: Server 1 at 98% CPU, melting down. Servers 2-5? Chilling at 5% CPU, basically watching Netflix. 😱

Me: "Why is all traffic going to ONE server?!"

DevOps: "Did you... configure the load balancer?"

Me: "The what now?"

Welcome to the day I learned that buying more servers doesn't magically distribute traffic. You need a load balancer - the traffic cop that actually knows how to share the work!

What's a Load Balancer Anyway? 🤔

Think of a load balancer like a restaurant host distributing customers to servers:

Without load balancer (Chaos):

100 customers → Server 1 (drowning! 🔥)
0 customers   → Server 2 (bored)
0 customers   → Server 3 (taking a nap)
0 customers   → Server 4 (playing games)

With load balancer (Organized):

Load Balancer
├─ 25 customers → Server 1 (perfect! ✅)
├─ 25 customers → Server 2 (perfect! ✅)
├─ 25 customers → Server 3 (perfect! ✅)
└─ 25 customers → Server 4 (perfect! ✅)

Translation: Load balancer = Smart router that distributes requests across multiple servers!

The Production Disaster That Taught Me Load Balancing 💀

Black Friday 2020, 9 AM (here we go!):

When I was designing our serverless e-commerce backend, I thought I was clever:

My naive architecture:

// DNS points directly to servers
api.myshop.com3.22.145.67 (Server 1)

// I added more servers but...
// All traffic still went to Server 1!

What I thought would happen:

1000 req/sec distributed evenly:
Server 1: 200 req/sec ✅
Server 2: 200 req/sec ✅
Server 3: 200 req/sec ✅
Server 4: 200 req/sec ✅
Server 5: 200 req/sec ✅

What ACTUALLY happened:

Server 1: 1000 req/sec 🔥🔥🔥
Server 2: 0 req/sec (literally idle)
Server 3: 0 req/sec (cache cold)
Server 4: 0 req/sec (wondering why it exists)
Server 5: 0 req/sec (crying in the corner)

The fallout:

  • Server 1 crashed after 15 minutes
  • Site went down completely
  • Lost $8,000 in revenue in 30 minutes
  • Customers flooding support
  • My stress level: 📈📈📈
  • My AWS bill: Still had to pay for 4 idle servers! 💸

The emergency fix:

# Panic-installed NGINX as load balancer
apt-get install nginx -y

# Quick config
upstream backend {
    server 10.0.1.1:3000;
    server 10.0.1.2:3000;
    server 10.0.1.3:3000;
    server 10.0.1.4:3000;
    server 10.0.1.5:3000;
}

# Reload
nginx -s reload

# Traffic suddenly distributed! 🎉

Results after load balancer:

  • All servers at healthy 60% CPU
  • Site stable for the rest of Black Friday
  • Handled 10x the original traffic
  • My boss: "Why didn't we do this from the start?"
  • Me: "I learned a valuable lesson..." 😅

Load Balancing Algorithm #1: Round Robin (The Classic) 🔄

How it works: Send requests to servers one by one in sequence.

Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1 (back to start)
Request 5 → Server 2
...

NGINX config:

# nginx.conf
upstream backend {
    # Round robin is DEFAULT (no special config needed!)
    server app1.internal:3000;
    server app2.internal:3000;
    server app3.internal:3000;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Why I love round robin:

  • ✅ Simple - just works!
  • ✅ Fair distribution
  • ✅ No server gets ignored
  • ✅ Stateless (no tracking needed)

The catch:

  • ⚠️ Ignores server capacity (treats powerful and weak servers equally)
  • ⚠️ Ignores current load (sends requests even if server is struggling)
  • ⚠️ Session affinity issues (user might hit different server each time)

When I use it: Simple stateless APIs where all servers are identical!

Load Balancing Algorithm #2: Least Connections (The Smart One) 🧠

How it works: Send requests to the server with fewest active connections.

Server 1: 50 connections
Server 2: 45 connections ← Send here! (least busy)
Server 3: 60 connections

NGINX config:

upstream backend {
    least_conn;  # Enable least connections algorithm

    server app1.internal:3000;
    server app2.internal:3000;
    server app3.internal:3000;
}

HAProxy config (my production setup):

# haproxy.cfg
backend app_servers
    balance leastconn  # Least connections

    server app1 10.0.1.1:3000 check
    server app2 10.0.1.2:3000 check
    server app3 10.0.1.3:3000 check

Why least connections is brilliant:

  • ✅ Adapts to varying request durations
  • ✅ Prevents overloading slow servers
  • ✅ Great for long-running requests (uploads, reports)
  • ✅ Naturally balances load

Real-world example from our API:

// Endpoint that takes 5 seconds
app.get('/api/generate-report', async (req, res) => {
    const report = await generateLargeReport(req.user.id);
    res.json(report);
    // Takes 5 seconds!
});

// Endpoint that takes 50ms
app.get('/api/user-profile', async (req, res) => {
    const user = await db.users.findById(req.user.id);
    res.json(user);
    // Takes 50ms
});

With round robin:

Server 1: Gets 10 report requests (50s of work!)
Server 2: Gets 10 profile requests (0.5s of work)
Server 3: Gets 10 profile requests (0.5s of work)

Result: Server 1 is drowning, others are idle! 😱

With least connections:

Server 1: Gets 1 report request (currently busy)
Server 2: Gets 9 report requests (distributes better!)
Server 3: Gets profile requests while others are busy

Result: Much more balanced! ✅

When designing our e-commerce backend, least connections saved us when report generation requests started queuing up!

Load Balancing Algorithm #3: Weighted Round Robin (The Custom One) ⚖️

The problem: You have servers with different capacities!

Server 1: 16 GB RAM, 8 CPUs (beefy! 💪)
Server 2: 4 GB RAM, 2 CPUs (smol)
Server 3: 4 GB RAM, 2 CPUs (smol)

Round robin treats them equally:

Server 1: Gets 33% of traffic (underutilized!)
Server 2: Gets 33% of traffic (struggling!)
Server 3: Gets 33% of traffic (struggling!)

Weighted round robin - Give more work to beefy servers:

upstream backend {
    server app1.internal:3000 weight=4;  # 4x capacity
    server app2.internal:3000 weight=1;  # 1x capacity
    server app3.internal:3000 weight=1;  # 1x capacity
}

# Distribution:
# Server 1: 4/6 = 66% of traffic
# Server 2: 1/6 = 17% of traffic
# Server 3: 1/6 = 17% of traffic

Real example from our production setup:

# Production load balancer
upstream production_api {
    # r5.2xlarge instance (8 vCPU, 64GB RAM)
    server api1.prod.internal:3000 weight=8;

    # t3.large instance (2 vCPU, 8GB RAM)
    server api2.prod.internal:3000 weight=2;

    # t3.large instance (2 vCPU, 8GB RAM)
    server api3.prod.internal:3000 weight=2;
}

Results:

  • Beefy server: Gets 66% of traffic, runs at 70% CPU ✅
  • Small servers: Get 17% each, run at 65% CPU ✅
  • All servers optimally utilized!

As a Technical Lead, I've learned: Match weights to server capacity. Don't waste expensive hardware or overload cheap instances!

Load Balancing Algorithm #4: IP Hash (The Sticky One) 📌

The problem - Session affinity:

// User logs in
POST /login → Server 1
// Server 1 stores session in memory

// User makes next request
GET /dashboard → Server 2 (different server!)
// Server 2: "Who are you? No session found!" 🤷
// User: "I just logged in!" 😡

The solution - IP hash:

upstream backend {
    ip_hash;  # Same IP always goes to same server!

    server app1.internal:3000;
    server app2.internal:3000;
    server app3.internal:3000;
}

# User 1.2.3.4 → Always Server 2
# User 5.6.7.8 → Always Server 1
# User 9.10.11.12 → Always Server 3

How it works:

// Load balancer logic
function selectServer(clientIP, servers) {
    const hash = hashFunction(clientIP);
    const serverIndex = hash % servers.length;
    return servers[serverIndex];
}

// Example:
hash("192.168.1.1") % 3 = 2Server 3 (always!)
hash("192.168.1.2") % 3 = 0Server 1 (always!)
hash("192.168.1.3") % 3 = 1Server 2 (always!)

Why I used IP hash for our legacy app:

  • ✅ Sessions work without shared storage
  • ✅ User gets consistent experience
  • ✅ WebSocket connections stay on same server
  • ✅ No distributed session storage needed

The catch:

  • ⚠️ Uneven distribution if users behind NAT (corporate networks)
  • ⚠️ Server failure breaks sessions for those users
  • ⚠️ Can't easily scale (adding server changes hash distribution)

The modern alternative - Session cookies:

upstream backend {
    server app1.internal:3000;
    server app2.internal:3000;
    server app3.internal:3000;
}

# Use cookie-based sticky sessions instead
map $cookie_route $route_backend {
    default backend;
    "app1" app1.internal:3000;
    "app2" app2.internal:3000;
    "app3" app3.internal:3000;
}

Or just use Redis for sessions (the right way!):

// All servers share same session store
const session = require('express-session');
const RedisStore = require('connect-redis')(session);
const redis = require('redis');

app.use(session({
    store: new RedisStore({ client: redis.createClient() }),
    secret: 'your-secret-key'
}));

// Now sessions work across ALL servers! 🎉

When architecting on AWS, I learned: Don't use IP hash. Use shared session storage (Redis/DynamoDB) and let the load balancer distribute freely!

Health Checks: The Load Balancer's Eyes 👀

The nightmare scenario:

Server 1: Healthy ✅
Server 2: CRASHED 💀
Server 3: Healthy ✅

Load balancer (without health checks):
Request 1 → Server 1 (works!)
Request 2 → Server 2 (500 error! 💥)
Request 3 → Server 3 (works!)
Request 4 → Server 2 (500 error! 💥)
Request 5 → Server 3 (works!)
Request 6 → Server 2 (500 error! 💥)

33% of requests fail! 😱

The solution - Health checks:

upstream backend {
    server app1.internal:3000 max_fails=3 fail_timeout=30s;
    server app2.internal:3000 max_fails=3 fail_timeout=30s;
    server app3.internal:3000 max_fails=3 fail_timeout=30s;
}

# If server fails 3 health checks, it's removed for 30 seconds

HAProxy health check config (more advanced):

backend app_servers
    balance leastconn

    option httpchk GET /health
    http-check expect status 200

    server app1 10.0.1.1:3000 check inter 5s fall 3 rise 2
    server app2 10.0.1.2:3000 check inter 5s fall 3 rise 2
    server app3 10.0.1.3:3000 check inter 5s fall 3 rise 2

# inter 5s: Check every 5 seconds
# fall 3: Mark down after 3 failures
# rise 2: Mark up after 2 successes

My production health check endpoint:

// server.js
app.get('/health', async (req, res) => {
    try {
        // Check database
        await db.query('SELECT 1');

        // Check Redis
        await redis.ping();

        // Check memory usage
        const memUsage = process.memoryUsage();
        const memPercent = (memUsage.heapUsed / memUsage.heapTotal) * 100;

        if (memPercent > 90) {
            throw new Error('High memory usage');
        }

        res.status(200).json({
            status: 'healthy',
            timestamp: Date.now()
        });
    } catch (error) {
        res.status(503).json({
            status: 'unhealthy',
            error: error.message
        });
    }
});

What happens with health checks:

09:00:00 - Server 2 crashes (database connection lost)
09:00:05 - Health check fails (first failure)
09:00:10 - Health check fails (second failure)
09:00:15 - Health check fails (third failure)
09:00:15 - Load balancer removes Server 2 from rotation! 🚨
09:00:15 onwards - All traffic goes to Server 1 & 3
09:05:00 - Server 2 recovers
09:05:05 - Health check succeeds (first success)
09:05:10 - Health check succeeds (second success)
09:05:10 - Load balancer adds Server 2 back! ✅

Result: Only 3 requests failed (during the 15s detection window)
Without health checks: 33% of ALL requests would fail! 💀

A scalability lesson that cost us: Always implement health checks! We once had a zombie server responding 500s for 2 hours before we noticed. Cost us thousands in lost conversions!

Load Balancer Technologies: What Should You Use? 🛠️

Option #1: NGINX (My Go-To for Most Projects)

Why I love NGINX:

  • ✅ Lightweight and fast
  • ✅ Simple configuration
  • ✅ Great for reverse proxy + load balancing
  • ✅ Battle-tested in production

Quick setup:

# /etc/nginx/nginx.conf
http {
    upstream backend {
        least_conn;

        server app1.local:3000 weight=2 max_fails=3 fail_timeout=30s;
        server app2.local:3000 weight=1 max_fails=3 fail_timeout=30s;
        server app3.local:3000 weight=1 max_fails=3 fail_timeout=30s;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

            # Connection handling
            proxy_connect_timeout 60s;
            proxy_send_timeout 60s;
            proxy_read_timeout 60s;
        }
    }
}

Docker Compose setup:

version: '3.8'

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - app1
      - app2
      - app3

  app1:
    image: myapp:latest
    environment:
      - PORT=3000

  app2:
    image: myapp:latest
    environment:
      - PORT=3000

  app3:
    image: myapp:latest
    environment:
      - PORT=3000

Option #2: HAProxy (The Advanced One)

When to use HAProxy:

  • ✅ Need advanced load balancing algorithms
  • ✅ TCP load balancing (not just HTTP)
  • ✅ Detailed statistics dashboard
  • ✅ More fine-grained control

Config example:

# haproxy.cfg
global
    maxconn 4096

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend http_front
    bind *:80
    default_backend app_servers

backend app_servers
    balance leastconn
    option httpchk GET /health

    server app1 10.0.1.1:3000 check inter 5s fall 3 rise 2
    server app2 10.0.1.2:3000 check inter 5s fall 3 rise 2
    server app3 10.0.1.3:3000 check inter 5s fall 3 rise 2

# Statistics dashboard
listen stats
    bind *:8080
    stats enable
    stats uri /stats
    stats refresh 30s

Visit http://your-lb:8080/stats for beautiful dashboard! 📊

Option #3: AWS Application Load Balancer (The Managed One)

When on AWS:

# Create target group
aws elbv2 create-target-group \
  --name my-app-targets \
  --protocol HTTP \
  --port 3000 \
  --vpc-id vpc-12345678 \
  --health-check-path /health

# Create load balancer
aws elbv2 create-load-balancer \
  --name my-app-lb \
  --subnets subnet-12345678 subnet-87654321 \
  --security-groups sg-12345678

# Register targets
aws elbv2 register-targets \
  --target-group-arn arn:aws:elasticloadbalancing:... \
  --targets Id=i-1234567890abcdef0 Id=i-0fedcba0987654321

Terraform for AWS ALB (what I actually use):

resource "aws_lb" "app" {
  name               = "my-app-lb"
  load_balancer_type = "application"
  subnets            = [aws_subnet.public_a.id, aws_subnet.public_b.id]
}

resource "aws_lb_target_group" "app" {
  name     = "app-targets"
  port     = 3000
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  health_check {
    path                = "/health"
    interval            = 30
    timeout             = 5
    healthy_threshold   = 2
    unhealthy_threshold = 3
  }
}

resource "aws_lb_listener" "app" {
  load_balancer_arn = aws_lb.app.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

Why I use AWS ALB in production:

  • ✅ Fully managed (no server maintenance)
  • ✅ Auto-scaling integration
  • ✅ SSL termination
  • ✅ WAF integration
  • ✅ Native AWS service integration

The catch: $$$ - Costs ~$20/month + bandwidth charges

Option #4: Kubernetes Service (The Cloud-Native One)

When you're on Kubernetes:

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  type: LoadBalancer  # Creates cloud load balancer
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 3000

---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3  # 3 instances
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 3000

        # Health checks
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5

        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 3

Kubernetes automatically:

  • ✅ Load balances across pods
  • ✅ Removes unhealthy pods
  • ✅ Adds new pods to load balancer
  • ✅ Handles zero-downtime deployments

Magic! 🪄

Common Load Balancing Mistakes (I Made All of These) 🪤

Mistake #1: No Connection Draining

The problem:

Server 1 is processing 100 long-running requests
Load balancer decides to remove Server 1 (deploy/scale-down)
Load balancer IMMEDIATELY stops sending traffic
Server 1 IMMEDIATELY shuts down
100 requests: FAILED! 💥

The fix - Connection draining:

# NGINX
upstream backend {
    server app1.internal:3000 max_conns=100;  # Limit connections
}

# On shutdown:
# 1. Stop accepting NEW connections
# 2. Wait for existing connections to finish (up to timeout)
# 3. Then shut down

HAProxy:

backend app_servers
    option abortonclose  # Don't kill in-flight requests
    timeout server 60s   # Wait up to 60s for completion

AWS ALB:

resource "aws_lb_target_group" "app" {
  deregistration_delay = 300  # Wait 5 minutes before deregistering
}

Mistake #2: Not Monitoring Load Balancer Itself

The irony: Your load balancer becomes the single point of failure!

3 healthy app servers ✅
1 load balancer 💥 CRASHES
Result: Entire site down! 😱

The fix - Redundant load balancers:

        ┌─────────────┐
        │  DNS / CDN  │
        └──────┬──────┘
               │
       ┌───────┴───────┐
       │               │
    ┌──▼──┐        ┌──▼──┐
    │ LB1 │        │ LB2 │  (failover)
    └──┬──┘        └──┬──┘
       │               │
    ───┴───────────────┴───
    │  │  │  │  │  │  │  │
    Server Farm

AWS ALB automatically:

  • Runs in multiple availability zones
  • Auto-replaces unhealthy instances
  • No single point of failure!

Self-hosted NGINX:

  • Use Keepalived for high availability
  • Floating IP fails over between load balancers

Mistake #3: Sending Traffic to Unhealthy Servers

A deployment pattern that burned us:

# Bad deployment
1. Deploy new code to Server 1
2. Server 1 starts up (takes 30s to be ready)
3. Load balancer immediately sends traffic
4. Server 1 returns 500 errors
5. Users see errors! 😱

The fix - Startup delay:

# Wait for server to be ready
upstream backend {
    server app1.internal:3000 max_fails=3 fail_timeout=30s;
}

Better - Proper health checks:

// Health check waits for full initialization
app.get('/health', async (req, res) => {
    if (!app.locals.initialized) {
        return res.status(503).json({ status: 'initializing' });
    }

    // Check dependencies
    const healthy = await checkDependencies();
    res.status(healthy ? 200 : 503).json({ status: healthy ? 'healthy' : 'unhealthy' });
});

// Initialization
async function startup() {
    await connectDatabase();
    await warmCache();
    await loadConfiguration();

    app.locals.initialized = true;
    console.log('Server ready for traffic!');
}

startup();

The Load Balancing Checklist ✅

Before going to production:

  • Load balancer configured with multiple backends
  • Health checks implemented (/health endpoint)
  • Connection draining enabled
  • Appropriate algorithm chosen (least_conn for most cases)
  • Monitoring set up (load balancer + backend metrics)
  • SSL termination configured (if using HTTPS)
  • Timeout values tuned (connect, read, send)
  • Backup load balancer (or managed service)
  • Tested failover (what happens if server crashes?)
  • Load tested under peak traffic

The Bottom Line 💡

Load balancers aren't optional for production - they're ESSENTIAL!

The essentials:

  1. Distribute traffic across multiple servers
  2. Health checks to detect failures
  3. Choose the right algorithm (least_conn for most cases)
  4. Connection draining for graceful shutdowns
  5. Monitor everything (load balancer is critical!)

The truth about load balancing:

It's not "buy more servers and they magically share traffic" - it's strategic distribution based on capacity, health, and connection count!

When designing our e-commerce backend, I learned: One load balancer is worth more than three servers. Without proper load balancing, adding servers doesn't improve performance - it just wastes money!

You don't need Kubernetes from day one - NGINX + Docker Compose is perfectly fine! Graduate to managed load balancers (AWS ALB, GCP Load Balancer) when you need it! 🚀

Your Action Plan 🎯

This week:

  1. Add health check endpoint to your app
  2. Set up basic NGINX load balancer locally
  3. Test with multiple app instances
  4. Verify traffic is distributed

This month:

  1. Deploy load balancer to production
  2. Configure health checks with proper intervals
  3. Set up monitoring and alerts
  4. Test server failure scenarios

This quarter:

  1. Implement connection draining
  2. Fine-tune algorithm based on traffic patterns
  3. Set up redundant load balancers
  4. Load test at 2x peak capacity

Resources Worth Your Time 📚

Tools I use daily:

Reading:

Real talk: The best load balancer is one that you'll actually maintain and monitor!


Building scalable architectures? Connect with me on LinkedIn and share your load balancing stories!

Want to see my production configs? Check out my GitHub - real NGINX and Terraform configs!

Now go forth and balance responsibly! ⚖️✨


P.S. If you have multiple servers but no load balancer, you're basically paying for servers to take naps. Fix it! 💤

P.P.S. I once forgot to enable health checks on a load balancer. Spent 3 hours debugging why 50% of requests were failing. The problem? Half my servers were down and the load balancer didn't know! Always. Enable. Health. Checks! 🚨