CDN & Caching: Stop Querying Your Database for Data That Never Changes 🚀💾

Real confession: The first time I deployed our e-commerce site to production, I was so proud. Clean code, normalized database, RESTful APIs - textbook perfect! Then I checked the analytics: Users in Australia were waiting 1.2 seconds just to see our homepage. The logo alone took 800ms to load. FROM VIRGINIA!

Me looking at CloudWatch: "Why is our database getting hammered with 50,000 queries/minute?"

My boss: "What are they querying?"

Me: checks logs "Product names... category lists... the site logo... things that haven't changed in months." 😱

Boss: "You're querying the database for a LOGO on EVERY page load?"

Me: "I... didn't think about caching..."

That day, I learned the most important lesson in web architecture: If data doesn't change often, stop fetching it like it does!

Welcome to caching and CDNs - the pattern that turns a slow, database-crushing site into a blazing-fast global powerhouse!

What's Caching & CDN Anyway? 🤔

Think of caching like your brain's memory vs going to the library:

Without caching (Every time you need info):

You: "What's the capital of France?"
Brain: "Hold on, let me drive to the library..."
*20 minutes later*
Brain: "It's Paris!"

You: "What's the capital of France?" (5 minutes later)
Brain: "Hold on, let me drive to the library AGAIN..."
*20 minutes later*
Brain: "Still Paris!"

With caching (Smart memory):

You: "What's the capital of France?"
Brain: "Let me check... not in memory, going to library..."
*20 minutes later*
Brain: "It's Paris! Saving that for next time..."

You: "What's the capital of France?" (5 minutes later)
Brain: "Paris!" *instant answer from memory*

You: "What's the capital of France?" (1 hour later)
Brain: "Paris!" *still instant*

Translation: Cache = Store frequently accessed data close to where it's needed so you don't have to fetch it repeatedly!

The Performance Nightmare That Forced Me to Cache 💀

When I architected our e-commerce platform at my previous company, we started with the "proper" approach:

Year 1 (Naive & Proud):

// routes/products.js
app.get('/api/products/:id', async (req, res) => {
  try {
    // Query database EVERY TIME
    const product = await db.query(
      'SELECT * FROM products WHERE id = ?',
      [req.params.id]
    );

    const category = await db.query(
      'SELECT * FROM categories WHERE id = ?',
      [product.category_id]
    );

    const reviews = await db.query(
      'SELECT * FROM reviews WHERE product_id = ?',
      [product.id]
    );

    res.json({ product, category, reviews });
    // Total: 3 database queries for data that changes maybe once a week!
    // Query time: 150ms (not terrible...)
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

What happened:

50 users browsing → 150 DB queries/second
500 users browsing → 1,500 DB queries/second
5,000 users (Black Friday) → 15,000 DB queries/second! 🔥
Database CPU: 95%
Query time: 150ms → 500ms → 2 seconds! 💀
Database connection pool exhausted
Site crashed at 11:47 PM on Black Friday
Lost $60,000 in sales in 20 minutes
Trending on Twitter: "Site crash on Black Friday"
My stress level: 📈📈📈📈📈

The wake-up call: We were treating the database like Google - asking it the same questions thousands of times per second for data that barely ever changed!

Caching Layer #1: Application-Level Cache (Redis) ⚡

The solution - Add Redis:

const Redis = require('redis');
const redis = Redis.createClient({ url: 'redis://localhost:6379' });

app.get('/api/products/:id', async (req, res) => {
  try {
    const productId = req.params.id;
    const cacheKey = `product:${productId}`;

    // Check cache first!
    const cached = await redis.get(cacheKey);

    if (cached) {
      console.log('Cache HIT! 🎯');
      return res.json(JSON.parse(cached));
      // Response time: 5ms! (30x faster!)
    }

    // Cache miss - query database
    console.log('Cache MISS - fetching from DB...');

    const product = await db.query(
      'SELECT * FROM products WHERE id = ?',
      [productId]
    );

    const category = await db.query(
      'SELECT * FROM categories WHERE id = ?',
      [product.category_id]
    );

    const reviews = await db.query(
      'SELECT * FROM reviews WHERE product_id = ?',
      [product.id]
    );

    const result = { product, category, reviews };

    // Store in cache for 1 hour
    await redis.setEx(cacheKey, 3600, JSON.stringify(result));

    res.json(result);

  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

Results after adding Redis cache:

// Before (no cache):
// 5,000 users × 3 queries each = 15,000 DB queries/second
// Database CPU: 95% (melting! 🔥)
// Response time: 2 seconds
// Database connection pool: EXHAUSTED 💀

// After (with Redis cache):
// Cache hit rate: 95%
// 5,000 users × 3 queries × 5% cache miss = 750 DB queries/second
// Database CPU: 12% (chillin'! ❄️)
// Response time: 5-10ms (200x faster!)
// Database connection pool: Plenty of room ✅

// Black Friday 2020:
// - Handled 10x the traffic with same database
// - Zero crashes
// - Site felt INSTANT
// - Boss: "What changed? Site is so fast now!"
// - Me: 😎

Caching Strategy #1: Cache-Aside (Lazy Loading) 🎯

The pattern I use most:

async function getProduct(productId) {
  const cacheKey = `product:${productId}`;

  // 1. Try cache first
  let product = await redis.get(cacheKey);

  if (product) {
    return JSON.parse(product); // Cache hit! 🎉
  }

  // 2. Cache miss - fetch from database
  product = await db.products.findById(productId);

  // 3. Store in cache for next time
  await redis.setEx(cacheKey, 3600, JSON.stringify(product));

  return product;
}

Why I love cache-aside:

✅ Only cache data that's actually requested
✅ Simple to implement
✅ Cache naturally warms up with traffic
✅ Easy to reason about

The catch:

⚠️ First request is always slow (cache miss)
⚠️ Cache can get stale if data updates

Caching Strategy #2: Write-Through Cache 📝

The concept: Update cache AND database together!

async function updateProduct(productId, updates) {
  const cacheKey = `product:${productId}`;

  // 1. Update database
  const product = await db.products.update(productId, updates);

  // 2. Update cache immediately
  await redis.setEx(cacheKey, 3600, JSON.stringify(product));

  // Cache is ALWAYS fresh! ✅
  return product;
}

async function createProduct(productData) {
  // 1. Create in database
  const product = await db.products.create(productData);

  // 2. Immediately cache it
  await redis.setEx(
    `product:${product.id}`,
    3600,
    JSON.stringify(product)
  );

  return product;
}

When designing our e-commerce backend, write-through caching meant product updates were INSTANTLY visible to users (no stale cache!)

Benefits:

✅ Cache always has latest data
✅ No stale cache issues
✅ Reads are always fast

Trade-offs:

⚠️ Writes are slightly slower (update 2 places)
⚠️ More complex error handling

Caching Strategy #3: Cache Invalidation (The Hard Problem) 💣

Phil Karlton's famous quote: "There are only two hard things in Computer Science: cache invalidation and naming things."

The nightmare scenario:

// Product price updated in database
await db.products.update(123, { price: 99.99 });

// But cache still has old price (149.99)!
// Users see wrong price for next hour! 😱
// Legal issues, angry customers, refunds...

Solution #1: Time-based expiration (TTL)

// Set TTL (Time To Live)
await redis.setEx('product:123', 300, JSON.stringify(product));
// Cache expires after 5 minutes

// Pro: Simple, automatic cleanup
// Con: Data can be stale for up to 5 minutes

Solution #2: Explicit invalidation

async function updateProduct(productId, updates) {
  // 1. Update database
  const product = await db.products.update(productId, updates);

  // 2. Delete from cache (next request will refetch)
  await redis.del(`product:${productId}`);

  // Or update cache directly (write-through)
  // await redis.setEx(`product:${productId}`, 3600, JSON.stringify(product));

  return product;
}

// Pro: Cache is always accurate
// Con: Must remember to invalidate everywhere data changes!

Solution #3: Event-based invalidation

const EventEmitter = require('events');
const events = new EventEmitter();

// When product updates, emit event
events.on('product.updated', async (productId) => {
  await redis.del(`product:${productId}`);
  console.log(`Invalidated cache for product ${productId}`);
});

// In your update function
async function updateProduct(productId, updates) {
  const product = await db.products.update(productId, updates);

  // Emit event (cache invalidation happens automatically)
  events.emit('product.updated', productId);

  return product;
}

// Pro: Centralized invalidation logic
// Con: Need event infrastructure

A caching lesson that cost us: We once forgot to invalidate cache when products went on sale. Users saw old prices for 1 hour. We honored the lower prices = $12,000 loss! Always invalidate! 💸

Caching Layer #2: CDN (Content Delivery Network) 🌍

The problem: Geography is SLOW!

User in Tokyo requests logo.png
→ Travels 6,000 miles to Virginia server
→ 800ms round trip! 🐌

User in Sydney requests logo.png
→ Travels 10,000 miles to Virginia server
→ 1,200ms round trip! 🐌🐌

// Same logo, queried millions of times, travels around the world!

The solution: CDN edge caching

User in Tokyo requests logo.png
→ CloudFront edge in Tokyo has it cached!
→ 20ms! ⚡

User in Sydney requests logo.png
→ CloudFront edge in Sydney has it cached!
→ 25ms! ⚡

// Logo cached in 200+ locations worldwide!

Setting up CloudFront CDN:

// AWS CloudFront configuration (Terraform)
resource "aws_cloudfront_distribution" "main" {
  enabled = true

  origin {
    domain_name = "api.myapp.com"
    origin_id   = "api-origin"

    custom_origin_config {
      http_port              = 80
      https_port             = 443
      origin_protocol_policy = "https-only"
    }
  }

  default_cache_behavior {
    target_origin_id       = "api-origin"
    viewer_protocol_policy = "redirect-to-https"

    allowed_methods = ["GET", "HEAD", "OPTIONS"]
    cached_methods  = ["GET", "HEAD"]

    # Cache configuration
    min_ttl     = 0
    default_ttl = 3600    # 1 hour
    max_ttl     = 86400   # 24 hours

    forwarded_values {
      query_string = false

      cookies {
        forward = "none"
      }
    }
  }

  # Cache behavior for static assets
  ordered_cache_behavior {
    path_pattern     = "/static/*"
    target_origin_id = "api-origin"

    min_ttl     = 31536000  # 1 year! (assets are versioned)
    default_ttl = 31536000
    max_ttl     = 31536000

    compress = true  # Gzip compression
  }

  # Geographic distribution
  price_class = "PriceClass_All"  # All edge locations

  # SSL certificate
  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate.cert.arn
    ssl_support_method  = "sni-only"
  }
}

What gets cached on CDN:

// ✅ Cache on CDN (long TTL):
- Static images (logo, icons)
- CSS files
- JavaScript bundles
- Fonts
- Product images
- Public API responses (product catalog)

// ❌ Don't cache on CDN:
- User-specific data (profile, cart)
- Authentication endpoints
- Admin panels
- Real-time data
- Checkout flow

Setting cache headers in your API:

// routes/products.js
app.get('/api/products', async (req, res) => {
  const products = await db.products.findAll();

  // Cache on CDN for 1 hour
  res.set('Cache-Control', 'public, max-age=3600');

  res.json(products);
});

app.get('/api/products/:id', async (req, res) => {
  const product = await getProductFromCache(req.params.id);

  // Cache on CDN for 5 minutes
  res.set('Cache-Control', 'public, max-age=300');

  res.json(product);
});

// User-specific data - NO CDN caching!
app.get('/api/user/cart', authenticateUser, async (req, res) => {
  const cart = await getUserCart(req.user.id);

  // Private, no CDN caching
  res.set('Cache-Control', 'private, no-cache, no-store, must-revalidate');

  res.json(cart);
});

Results after adding CloudFront CDN:

// Before (no CDN):
// User in Tokyo: 800ms to load logo
// User in Sydney: 1,200ms to load logo
// User in London: 600ms to load logo
// Database: Hit on EVERY asset request
// Bandwidth costs: $2,000/month

// After (with CloudFront):
// User in Tokyo: 20ms to load logo (40x faster!) 🚀
// User in Sydney: 25ms to load logo (48x faster!) 🚀
// User in London: 18ms to load logo (33x faster!) 🚀
// Database: Only hit on cache miss (1% of requests)
// Bandwidth costs: $400/month (saved $1,600/month!) 💰
// CloudFront costs: $300/month
// Net savings: $1,300/month + way better UX!

The Multi-Layer Caching Strategy 🎯

My production caching architecture:

┌─────────────────────────────────────────────┐
│         User's Browser Cache               │  ← Layer 1: Browser
│         (Cache-Control headers)             │     TTL: Hours/Days
└───────────────┬─────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────┐
│         CDN Edge Cache                      │  ← Layer 2: CDN
│         (CloudFront, Cloudflare)            │     TTL: Minutes/Hours
└───────────────┬─────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────┐
│         Application Cache                   │  ← Layer 3: Redis
│         (Redis, Memcached)                  │     TTL: Seconds/Minutes
└───────────────┬─────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────┐
│         Database Query Cache                │  ← Layer 4: DB
│         (PostgreSQL query cache)            │     TTL: Automatic
└───────────────┬─────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────┐
│         Database                            │  ← Source of Truth
│         (PostgreSQL, MySQL)                 │
└─────────────────────────────────────────────┘

Example request flow:

// User requests: GET /api/products/123

// 1. Browser cache check
if (browserHasCached('/api/products/123')) {
  // Return from browser cache - INSTANT! (0ms)
  return cachedData;
}

// 2. CDN edge cache check (e.g., Tokyo edge server)
if (cdnHasCached('/api/products/123')) {
  // Return from nearby CDN - SUPER FAST! (20ms)
  return cachedData;
}

// 3. Application cache check (Redis)
const cacheKey = 'product:123';
const cached = await redis.get(cacheKey);

if (cached) {
  // Return from Redis - FAST! (5ms)
  return JSON.parse(cached);
}

// 4. Database query (cache miss on all layers)
const product = await db.query('SELECT * FROM products WHERE id = 123');
// Database query: SLOW (100-500ms)

// 5. Store in Redis for next request
await redis.setEx(cacheKey, 300, JSON.stringify(product));

return product;

// Summary:
// - First request: 100-500ms (database)
// - Next request (same server): 5ms (Redis)
// - Next request (different server, same region): 20ms (CDN)
// - Next request (same user): 0ms (browser cache)
//
// Effective cache hit rate: 99.8%
// Database load reduced by 99.8%! 🎉

Cache Warming: Don't Wait for Users 🔥

The problem:

// Site just deployed, caches are EMPTY
// First 10,000 users all hit database
// Database melts! 🔥

The solution - Warm the cache:

// scripts/warm-cache.js
async function warmProductCache() {
  console.log('🔥 Warming product cache...');

  // Get top 1000 most popular products
  const products = await db.query(`
    SELECT id FROM products
    ORDER BY views DESC
    LIMIT 1000
  `);

  for (const product of products) {
    const cacheKey = `product:${product.id}`;

    // Fetch full product data
    const fullProduct = await db.products.findById(product.id);

    // Store in Redis
    await redis.setEx(cacheKey, 3600, JSON.stringify(fullProduct));

    console.log(`Cached product ${product.id}`);
  }

  console.log('✅ Cache warmed!');
}

// Run on deployment
warmProductCache();

// Or schedule to run periodically
setInterval(warmProductCache, 3600000); // Every hour

When architecting on AWS, I learned: Always warm your cache before traffic hits! We once deployed cold and the first 5 minutes looked like a DDoS attack on our database! 😱

Common Caching Mistakes (I Made All of These) 🪤

Mistake #1: Caching User-Specific Data Globally

// 💀 DISASTER CODE - DO NOT USE!
app.get('/api/user/profile', async (req, res) => {
  const cacheKey = 'user:profile'; // ← BUG! Same key for all users!

  const cached = await redis.get(cacheKey);
  if (cached) {
    return res.json(JSON.parse(cached)); // User A sees User B's data! 😱
  }

  const profile = await db.users.findById(req.user.id);
  await redis.setEx(cacheKey, 300, JSON.stringify(profile));

  res.json(profile);
});

// ✅ CORRECT CODE
app.get('/api/user/profile', async (req, res) => {
  const cacheKey = `user:${req.user.id}:profile`; // ← Include user ID!

  const cached = await redis.get(cacheKey);
  if (cached) {
    return res.json(JSON.parse(cached));
  }

  const profile = await db.users.findById(req.user.id);
  await redis.setEx(cacheKey, 300, JSON.stringify(profile));

  res.json(profile);
});

A production disaster I caused: Forgot to include user ID in cache key. User A logged in and saw User B's shopping cart! 200 angry support tickets. Spent 6 hours debugging. Always include unique identifiers! 🔑

Mistake #2: Caching Too Long

// BAD: Cache product prices for 24 hours
await redis.setEx('product:123:price', 86400, JSON.stringify({ price: 99.99 }));

// Problem: Price changes during sale, users see old price for 24 hours!
// Legal issues, angry customers, refunds...

// GOOD: Cache for reasonable duration
await redis.setEx('product:123:price', 300, JSON.stringify({ price: 99.99 }));
// 5 minutes - good balance between performance and freshness

Cache TTL guidelines I use:

const CACHE_TTL = {
  STATIC_ASSETS: 31536000,    // 1 year (versioned files)
  PRODUCT_CATALOG: 3600,       // 1 hour (changes occasionally)
  PRODUCT_DETAILS: 300,        // 5 minutes (prices may change)
  USER_PROFILE: 60,            // 1 minute (users update frequently)
  SHOPPING_CART: 30,           // 30 seconds (real-time updates needed)
  STOCK_LEVELS: 10,            // 10 seconds (critical accuracy)
  // Never cache: Auth tokens, checkout, payments
};

Mistake #3: Cache Stampede (Thundering Herd)

// PROBLEM: Cache expires, 10,000 concurrent requests all hit database!

// All 10,000 requests check cache simultaneously
const cached = await redis.get('popular-product');

if (!cached) {
  // All 10,000 requests query database AT THE SAME TIME! 💥
  const product = await db.products.findById(123);
  await redis.setEx('popular-product', 300, JSON.stringify(product));
}

Solution - Locking with Redis:

async function getProductWithLock(productId) {
  const cacheKey = `product:${productId}`;
  const lockKey = `lock:${cacheKey}`;

  // Try cache first
  let cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  // Try to acquire lock
  const lockAcquired = await redis.set(lockKey, '1', {
    NX: true,  // Only set if doesn't exist
    EX: 10     // Expire in 10 seconds
  });

  if (lockAcquired) {
    // We got the lock! Fetch from database
    const product = await db.products.findById(productId);
    await redis.setEx(cacheKey, 300, JSON.stringify(product));
    await redis.del(lockKey); // Release lock
    return product;
  } else {
    // Someone else is fetching - wait a bit and try cache again
    await new Promise(resolve => setTimeout(resolve, 100));

    cached = await redis.get(cacheKey);
    if (cached) return JSON.parse(cached);

    // Still no cache? Fetch from database
    return await db.products.findById(productId);
  }
}

// Now only ONE request hits the database, others wait for it! ✅

Mistake #4: Not Monitoring Cache Performance

// BAD: No visibility into cache effectiveness
// Are we getting cache hits? How often? No idea! 🤷

// GOOD: Monitor cache metrics
class MonitoredCache {
  constructor(redis) {
    this.redis = redis;
    this.hits = 0;
    this.misses = 0;
  }

  async get(key) {
    const value = await this.redis.get(key);

    if (value) {
      this.hits++;
      console.log(`Cache HIT for ${key} (hit rate: ${this.getHitRate()}%)`);
    } else {
      this.misses++;
      console.log(`Cache MISS for ${key} (hit rate: ${this.getHitRate()}%)`);
    }

    return value;
  }

  async set(key, value, ttl) {
    return this.redis.setEx(key, ttl, value);
  }

  getHitRate() {
    const total = this.hits + this.misses;
    if (total === 0) return 0;
    return ((this.hits / total) * 100).toFixed(2);
  }

  getStats() {
    return {
      hits: this.hits,
      misses: this.misses,
      hitRate: this.getHitRate() + '%',
      total: this.hits + this.misses
    };
  }
}

// Monitor in production
setInterval(() => {
  const stats = cache.getStats();
  console.log('📊 Cache stats:', stats);

  // Send to monitoring (CloudWatch, Datadog, etc.)
  metrics.gauge('cache.hit_rate', parseFloat(stats.hitRate));

  if (parseFloat(stats.hitRate) < 80) {
    console.warn('⚠️ Cache hit rate below 80%! Investigate!');
  }
}, 60000); // Log every minute

The Bottom Line 💡

Caching isn't optional for production - it's ESSENTIAL for scalability and global performance!

The essentials:

Multi-layer caching - Browser, CDN, application, database
Appropriate TTLs - Balance freshness vs performance
Cache invalidation strategy - Keep data accurate
CDN for static assets - Serve from edge locations globally
Monitor cache hit rates - Aim for 95%+ hit rate
Cache warming - Don't wait for users to populate cache

The truth about caching:

It's not "store everything forever" - it's strategic caching with smart invalidation! You're trading storage and complexity for massive performance gains and reduced database load!

When designing our e-commerce backend, I learned this: Caching turned a site that could barely handle 100 concurrent users into one serving 10,000+ users with the same database. But bad caching (stale data, wrong cache keys) caused more bugs than anything else! Cache wisely! 🎯

You don't need perfect caching from day one - start with Redis for your hottest queries and add CDN for static assets. Measure, optimize, iterate! 🚀

Your Caching Checklist ✅

Before going to production:

Redis cache for frequently accessed data
CDN for static assets (images, CSS, JS)
Appropriate Cache-Control headers
Cache invalidation on data updates
Reasonable TTLs based on data volatility
Cache monitoring (hit rate, miss rate)
Cache warming on deployment
Tested cache invalidation scenarios
Documented what's cached and why

Your Action Plan 🎯

This week:

Identify your top 5 most-queried database queries
Add Redis caching for those queries
Set up CloudFront/Cloudflare for static assets
Measure cache hit rate

This month:

Implement cache invalidation strategy
Add Cache-Control headers to all API responses
Set up cache monitoring dashboard
Optimize TTLs based on traffic patterns

This quarter:

Implement multi-layer caching strategy
Add cache warming on deployments
Optimize CDN configuration for global users
Reduce database load by 90%+

Resources Worth Your Time 📚

Tools I use daily:

Redis - In-memory cache (my go-to!)
CloudFront - AWS CDN
Cloudflare - CDN + security

Reading:

Real talk: The best caching strategy is the one that solves YOUR bottlenecks. Measure first, cache strategically!

Building fast, global applications? Connect with me on LinkedIn and share your caching war stories!

Want to see my caching implementations? Check out my GitHub - real production Redis and CDN configs!

Now go forth and cache responsibly! 🚀💾

P.S. If your database is getting hammered with queries for data that never changes, add caching TODAY! I once reduced database load from 15,000 queries/sec to 500 queries/sec with 2 hours of Redis implementation. Best ROI ever! 📊

P.P.S. I once cached user shopping carts globally (same cache key for all users). User A added an iPhone to their cart, User B saw it in THEIR cart and bought it! We shipped it to User B. Lost $1,200 + shipping. Always include unique identifiers in cache keys! ALWAYS! 😱💸