CDN & Caching: Stop Querying Your Database for Data That Never Changes šš¾
CDN & Caching: Stop Querying Your Database for Data That Never Changes šš¾
Real confession: The first time I deployed our e-commerce site to production, I was so proud. Clean code, normalized database, RESTful APIs - textbook perfect! Then I checked the analytics: Users in Australia were waiting 1.2 seconds just to see our homepage. The logo alone took 800ms to load. FROM VIRGINIA!
Me looking at CloudWatch: "Why is our database getting hammered with 50,000 queries/minute?"
My boss: "What are they querying?"
Me: checks logs "Product names... category lists... the site logo... things that haven't changed in months." š±
Boss: "You're querying the database for a LOGO on EVERY page load?"
Me: "I... didn't think about caching..."
That day, I learned the most important lesson in web architecture: If data doesn't change often, stop fetching it like it does!
Welcome to caching and CDNs - the pattern that turns a slow, database-crushing site into a blazing-fast global powerhouse!
What's Caching & CDN Anyway? š¤
Think of caching like your brain's memory vs going to the library:
Without caching (Every time you need info):
You: "What's the capital of France?"
Brain: "Hold on, let me drive to the library..."
*20 minutes later*
Brain: "It's Paris!"
You: "What's the capital of France?" (5 minutes later)
Brain: "Hold on, let me drive to the library AGAIN..."
*20 minutes later*
Brain: "Still Paris!"
With caching (Smart memory):
You: "What's the capital of France?"
Brain: "Let me check... not in memory, going to library..."
*20 minutes later*
Brain: "It's Paris! Saving that for next time..."
You: "What's the capital of France?" (5 minutes later)
Brain: "Paris!" *instant answer from memory*
You: "What's the capital of France?" (1 hour later)
Brain: "Paris!" *still instant*
Translation: Cache = Store frequently accessed data close to where it's needed so you don't have to fetch it repeatedly!
The Performance Nightmare That Forced Me to Cache š
When I architected our e-commerce platform at my previous company, we started with the "proper" approach:
Year 1 (Naive & Proud):
// routes/products.js
app.get('/api/products/:id', async (req, res) => {
try {
// Query database EVERY TIME
const product = await db.query(
'SELECT * FROM products WHERE id = ?',
[req.params.id]
);
const category = await db.query(
'SELECT * FROM categories WHERE id = ?',
[product.category_id]
);
const reviews = await db.query(
'SELECT * FROM reviews WHERE product_id = ?',
[product.id]
);
res.json({ product, category, reviews });
// Total: 3 database queries for data that changes maybe once a week!
// Query time: 150ms (not terrible...)
} catch (error) {
res.status(500).json({ error: error.message });
}
});
What happened:
- 50 users browsing ā 150 DB queries/second
- 500 users browsing ā 1,500 DB queries/second
- 5,000 users (Black Friday) ā 15,000 DB queries/second! š„
- Database CPU: 95%
- Query time: 150ms ā 500ms ā 2 seconds! š
- Database connection pool exhausted
- Site crashed at 11:47 PM on Black Friday
- Lost $60,000 in sales in 20 minutes
- Trending on Twitter: "Site crash on Black Friday"
- My stress level: ššššš
The wake-up call: We were treating the database like Google - asking it the same questions thousands of times per second for data that barely ever changed!
Caching Layer #1: Application-Level Cache (Redis) ā”
The solution - Add Redis:
const Redis = require('redis');
const redis = Redis.createClient({ url: 'redis://localhost:6379' });
app.get('/api/products/:id', async (req, res) => {
try {
const productId = req.params.id;
const cacheKey = `product:${productId}`;
// Check cache first!
const cached = await redis.get(cacheKey);
if (cached) {
console.log('Cache HIT! šÆ');
return res.json(JSON.parse(cached));
// Response time: 5ms! (30x faster!)
}
// Cache miss - query database
console.log('Cache MISS - fetching from DB...');
const product = await db.query(
'SELECT * FROM products WHERE id = ?',
[productId]
);
const category = await db.query(
'SELECT * FROM categories WHERE id = ?',
[product.category_id]
);
const reviews = await db.query(
'SELECT * FROM reviews WHERE product_id = ?',
[product.id]
);
const result = { product, category, reviews };
// Store in cache for 1 hour
await redis.setEx(cacheKey, 3600, JSON.stringify(result));
res.json(result);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
Results after adding Redis cache:
// Before (no cache):
// 5,000 users Ć 3 queries each = 15,000 DB queries/second
// Database CPU: 95% (melting! š„)
// Response time: 2 seconds
// Database connection pool: EXHAUSTED š
// After (with Redis cache):
// Cache hit rate: 95%
// 5,000 users Ć 3 queries Ć 5% cache miss = 750 DB queries/second
// Database CPU: 12% (chillin'! āļø)
// Response time: 5-10ms (200x faster!)
// Database connection pool: Plenty of room ā
// Black Friday 2020:
// - Handled 10x the traffic with same database
// - Zero crashes
// - Site felt INSTANT
// - Boss: "What changed? Site is so fast now!"
// - Me: š
Caching Strategy #1: Cache-Aside (Lazy Loading) šÆ
The pattern I use most:
async function getProduct(productId) {
const cacheKey = `product:${productId}`;
// 1. Try cache first
let product = await redis.get(cacheKey);
if (product) {
return JSON.parse(product); // Cache hit! š
}
// 2. Cache miss - fetch from database
product = await db.products.findById(productId);
// 3. Store in cache for next time
await redis.setEx(cacheKey, 3600, JSON.stringify(product));
return product;
}
Why I love cache-aside:
- ā Only cache data that's actually requested
- ā Simple to implement
- ā Cache naturally warms up with traffic
- ā Easy to reason about
The catch:
- ā ļø First request is always slow (cache miss)
- ā ļø Cache can get stale if data updates
Caching Strategy #2: Write-Through Cache š
The concept: Update cache AND database together!
async function updateProduct(productId, updates) {
const cacheKey = `product:${productId}`;
// 1. Update database
const product = await db.products.update(productId, updates);
// 2. Update cache immediately
await redis.setEx(cacheKey, 3600, JSON.stringify(product));
// Cache is ALWAYS fresh! ā
return product;
}
async function createProduct(productData) {
// 1. Create in database
const product = await db.products.create(productData);
// 2. Immediately cache it
await redis.setEx(
`product:${product.id}`,
3600,
JSON.stringify(product)
);
return product;
}
When designing our e-commerce backend, write-through caching meant product updates were INSTANTLY visible to users (no stale cache!)
Benefits:
- ā Cache always has latest data
- ā No stale cache issues
- ā Reads are always fast
Trade-offs:
- ā ļø Writes are slightly slower (update 2 places)
- ā ļø More complex error handling
Caching Strategy #3: Cache Invalidation (The Hard Problem) š£
Phil Karlton's famous quote: "There are only two hard things in Computer Science: cache invalidation and naming things."
The nightmare scenario:
// Product price updated in database
await db.products.update(123, { price: 99.99 });
// But cache still has old price (149.99)!
// Users see wrong price for next hour! š±
// Legal issues, angry customers, refunds...
Solution #1: Time-based expiration (TTL)
// Set TTL (Time To Live)
await redis.setEx('product:123', 300, JSON.stringify(product));
// Cache expires after 5 minutes
// Pro: Simple, automatic cleanup
// Con: Data can be stale for up to 5 minutes
Solution #2: Explicit invalidation
async function updateProduct(productId, updates) {
// 1. Update database
const product = await db.products.update(productId, updates);
// 2. Delete from cache (next request will refetch)
await redis.del(`product:${productId}`);
// Or update cache directly (write-through)
// await redis.setEx(`product:${productId}`, 3600, JSON.stringify(product));
return product;
}
// Pro: Cache is always accurate
// Con: Must remember to invalidate everywhere data changes!
Solution #3: Event-based invalidation
const EventEmitter = require('events');
const events = new EventEmitter();
// When product updates, emit event
events.on('product.updated', async (productId) => {
await redis.del(`product:${productId}`);
console.log(`Invalidated cache for product ${productId}`);
});
// In your update function
async function updateProduct(productId, updates) {
const product = await db.products.update(productId, updates);
// Emit event (cache invalidation happens automatically)
events.emit('product.updated', productId);
return product;
}
// Pro: Centralized invalidation logic
// Con: Need event infrastructure
A caching lesson that cost us: We once forgot to invalidate cache when products went on sale. Users saw old prices for 1 hour. We honored the lower prices = $12,000 loss! Always invalidate! šø
Caching Layer #2: CDN (Content Delivery Network) š
The problem: Geography is SLOW!
User in Tokyo requests logo.png
ā Travels 6,000 miles to Virginia server
ā 800ms round trip! š
User in Sydney requests logo.png
ā Travels 10,000 miles to Virginia server
ā 1,200ms round trip! šš
// Same logo, queried millions of times, travels around the world!
The solution: CDN edge caching
User in Tokyo requests logo.png
ā CloudFront edge in Tokyo has it cached!
ā 20ms! ā”
User in Sydney requests logo.png
ā CloudFront edge in Sydney has it cached!
ā 25ms! ā”
// Logo cached in 200+ locations worldwide!
Setting up CloudFront CDN:
// AWS CloudFront configuration (Terraform)
resource "aws_cloudfront_distribution" "main" {
enabled = true
origin {
domain_name = "api.myapp.com"
origin_id = "api-origin"
custom_origin_config {
http_port = 80
https_port = 443
origin_protocol_policy = "https-only"
}
}
default_cache_behavior {
target_origin_id = "api-origin"
viewer_protocol_policy = "redirect-to-https"
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
# Cache configuration
min_ttl = 0
default_ttl = 3600 # 1 hour
max_ttl = 86400 # 24 hours
forwarded_values {
query_string = false
cookies {
forward = "none"
}
}
}
# Cache behavior for static assets
ordered_cache_behavior {
path_pattern = "/static/*"
target_origin_id = "api-origin"
min_ttl = 31536000 # 1 year! (assets are versioned)
default_ttl = 31536000
max_ttl = 31536000
compress = true # Gzip compression
}
# Geographic distribution
price_class = "PriceClass_All" # All edge locations
# SSL certificate
viewer_certificate {
acm_certificate_arn = aws_acm_certificate.cert.arn
ssl_support_method = "sni-only"
}
}
What gets cached on CDN:
// ā
Cache on CDN (long TTL):
- Static images (logo, icons)
- CSS files
- JavaScript bundles
- Fonts
- Product images
- Public API responses (product catalog)
// ā Don't cache on CDN:
- User-specific data (profile, cart)
- Authentication endpoints
- Admin panels
- Real-time data
- Checkout flow
Setting cache headers in your API:
// routes/products.js
app.get('/api/products', async (req, res) => {
const products = await db.products.findAll();
// Cache on CDN for 1 hour
res.set('Cache-Control', 'public, max-age=3600');
res.json(products);
});
app.get('/api/products/:id', async (req, res) => {
const product = await getProductFromCache(req.params.id);
// Cache on CDN for 5 minutes
res.set('Cache-Control', 'public, max-age=300');
res.json(product);
});
// User-specific data - NO CDN caching!
app.get('/api/user/cart', authenticateUser, async (req, res) => {
const cart = await getUserCart(req.user.id);
// Private, no CDN caching
res.set('Cache-Control', 'private, no-cache, no-store, must-revalidate');
res.json(cart);
});
Results after adding CloudFront CDN:
// Before (no CDN):
// User in Tokyo: 800ms to load logo
// User in Sydney: 1,200ms to load logo
// User in London: 600ms to load logo
// Database: Hit on EVERY asset request
// Bandwidth costs: $2,000/month
// After (with CloudFront):
// User in Tokyo: 20ms to load logo (40x faster!) š
// User in Sydney: 25ms to load logo (48x faster!) š
// User in London: 18ms to load logo (33x faster!) š
// Database: Only hit on cache miss (1% of requests)
// Bandwidth costs: $400/month (saved $1,600/month!) š°
// CloudFront costs: $300/month
// Net savings: $1,300/month + way better UX!
The Multi-Layer Caching Strategy šÆ
My production caching architecture:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā User's Browser Cache ā ā Layer 1: Browser
ā (Cache-Control headers) ā TTL: Hours/Days
āāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā CDN Edge Cache ā ā Layer 2: CDN
ā (CloudFront, Cloudflare) ā TTL: Minutes/Hours
āāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Application Cache ā ā Layer 3: Redis
ā (Redis, Memcached) ā TTL: Seconds/Minutes
āāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Database Query Cache ā ā Layer 4: DB
ā (PostgreSQL query cache) ā TTL: Automatic
āāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Database ā ā Source of Truth
ā (PostgreSQL, MySQL) ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Example request flow:
// User requests: GET /api/products/123
// 1. Browser cache check
if (browserHasCached('/api/products/123')) {
// Return from browser cache - INSTANT! (0ms)
return cachedData;
}
// 2. CDN edge cache check (e.g., Tokyo edge server)
if (cdnHasCached('/api/products/123')) {
// Return from nearby CDN - SUPER FAST! (20ms)
return cachedData;
}
// 3. Application cache check (Redis)
const cacheKey = 'product:123';
const cached = await redis.get(cacheKey);
if (cached) {
// Return from Redis - FAST! (5ms)
return JSON.parse(cached);
}
// 4. Database query (cache miss on all layers)
const product = await db.query('SELECT * FROM products WHERE id = 123');
// Database query: SLOW (100-500ms)
// 5. Store in Redis for next request
await redis.setEx(cacheKey, 300, JSON.stringify(product));
return product;
// Summary:
// - First request: 100-500ms (database)
// - Next request (same server): 5ms (Redis)
// - Next request (different server, same region): 20ms (CDN)
// - Next request (same user): 0ms (browser cache)
//
// Effective cache hit rate: 99.8%
// Database load reduced by 99.8%! š
Cache Warming: Don't Wait for Users š„
The problem:
// Site just deployed, caches are EMPTY
// First 10,000 users all hit database
// Database melts! š„
The solution - Warm the cache:
// scripts/warm-cache.js
async function warmProductCache() {
console.log('š„ Warming product cache...');
// Get top 1000 most popular products
const products = await db.query(`
SELECT id FROM products
ORDER BY views DESC
LIMIT 1000
`);
for (const product of products) {
const cacheKey = `product:${product.id}`;
// Fetch full product data
const fullProduct = await db.products.findById(product.id);
// Store in Redis
await redis.setEx(cacheKey, 3600, JSON.stringify(fullProduct));
console.log(`Cached product ${product.id}`);
}
console.log('ā
Cache warmed!');
}
// Run on deployment
warmProductCache();
// Or schedule to run periodically
setInterval(warmProductCache, 3600000); // Every hour
When architecting on AWS, I learned: Always warm your cache before traffic hits! We once deployed cold and the first 5 minutes looked like a DDoS attack on our database! š±
Common Caching Mistakes (I Made All of These) šŖ¤
Mistake #1: Caching User-Specific Data Globally
// š DISASTER CODE - DO NOT USE!
app.get('/api/user/profile', async (req, res) => {
const cacheKey = 'user:profile'; // ā BUG! Same key for all users!
const cached = await redis.get(cacheKey);
if (cached) {
return res.json(JSON.parse(cached)); // User A sees User B's data! š±
}
const profile = await db.users.findById(req.user.id);
await redis.setEx(cacheKey, 300, JSON.stringify(profile));
res.json(profile);
});
// ā
CORRECT CODE
app.get('/api/user/profile', async (req, res) => {
const cacheKey = `user:${req.user.id}:profile`; // ā Include user ID!
const cached = await redis.get(cacheKey);
if (cached) {
return res.json(JSON.parse(cached));
}
const profile = await db.users.findById(req.user.id);
await redis.setEx(cacheKey, 300, JSON.stringify(profile));
res.json(profile);
});
A production disaster I caused: Forgot to include user ID in cache key. User A logged in and saw User B's shopping cart! 200 angry support tickets. Spent 6 hours debugging. Always include unique identifiers! š
Mistake #2: Caching Too Long
// BAD: Cache product prices for 24 hours
await redis.setEx('product:123:price', 86400, JSON.stringify({ price: 99.99 }));
// Problem: Price changes during sale, users see old price for 24 hours!
// Legal issues, angry customers, refunds...
// GOOD: Cache for reasonable duration
await redis.setEx('product:123:price', 300, JSON.stringify({ price: 99.99 }));
// 5 minutes - good balance between performance and freshness
Cache TTL guidelines I use:
const CACHE_TTL = {
STATIC_ASSETS: 31536000, // 1 year (versioned files)
PRODUCT_CATALOG: 3600, // 1 hour (changes occasionally)
PRODUCT_DETAILS: 300, // 5 minutes (prices may change)
USER_PROFILE: 60, // 1 minute (users update frequently)
SHOPPING_CART: 30, // 30 seconds (real-time updates needed)
STOCK_LEVELS: 10, // 10 seconds (critical accuracy)
// Never cache: Auth tokens, checkout, payments
};
Mistake #3: Cache Stampede (Thundering Herd)
// PROBLEM: Cache expires, 10,000 concurrent requests all hit database!
// All 10,000 requests check cache simultaneously
const cached = await redis.get('popular-product');
if (!cached) {
// All 10,000 requests query database AT THE SAME TIME! š„
const product = await db.products.findById(123);
await redis.setEx('popular-product', 300, JSON.stringify(product));
}
Solution - Locking with Redis:
async function getProductWithLock(productId) {
const cacheKey = `product:${productId}`;
const lockKey = `lock:${cacheKey}`;
// Try cache first
let cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// Try to acquire lock
const lockAcquired = await redis.set(lockKey, '1', {
NX: true, // Only set if doesn't exist
EX: 10 // Expire in 10 seconds
});
if (lockAcquired) {
// We got the lock! Fetch from database
const product = await db.products.findById(productId);
await redis.setEx(cacheKey, 300, JSON.stringify(product));
await redis.del(lockKey); // Release lock
return product;
} else {
// Someone else is fetching - wait a bit and try cache again
await new Promise(resolve => setTimeout(resolve, 100));
cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// Still no cache? Fetch from database
return await db.products.findById(productId);
}
}
// Now only ONE request hits the database, others wait for it! ā
Mistake #4: Not Monitoring Cache Performance
// BAD: No visibility into cache effectiveness
// Are we getting cache hits? How often? No idea! š¤·
// GOOD: Monitor cache metrics
class MonitoredCache {
constructor(redis) {
this.redis = redis;
this.hits = 0;
this.misses = 0;
}
async get(key) {
const value = await this.redis.get(key);
if (value) {
this.hits++;
console.log(`Cache HIT for ${key} (hit rate: ${this.getHitRate()}%)`);
} else {
this.misses++;
console.log(`Cache MISS for ${key} (hit rate: ${this.getHitRate()}%)`);
}
return value;
}
async set(key, value, ttl) {
return this.redis.setEx(key, ttl, value);
}
getHitRate() {
const total = this.hits + this.misses;
if (total === 0) return 0;
return ((this.hits / total) * 100).toFixed(2);
}
getStats() {
return {
hits: this.hits,
misses: this.misses,
hitRate: this.getHitRate() + '%',
total: this.hits + this.misses
};
}
}
// Monitor in production
setInterval(() => {
const stats = cache.getStats();
console.log('š Cache stats:', stats);
// Send to monitoring (CloudWatch, Datadog, etc.)
metrics.gauge('cache.hit_rate', parseFloat(stats.hitRate));
if (parseFloat(stats.hitRate) < 80) {
console.warn('ā ļø Cache hit rate below 80%! Investigate!');
}
}, 60000); // Log every minute
The Bottom Line š”
Caching isn't optional for production - it's ESSENTIAL for scalability and global performance!
The essentials:
- Multi-layer caching - Browser, CDN, application, database
- Appropriate TTLs - Balance freshness vs performance
- Cache invalidation strategy - Keep data accurate
- CDN for static assets - Serve from edge locations globally
- Monitor cache hit rates - Aim for 95%+ hit rate
- Cache warming - Don't wait for users to populate cache
The truth about caching:
It's not "store everything forever" - it's strategic caching with smart invalidation! You're trading storage and complexity for massive performance gains and reduced database load!
When designing our e-commerce backend, I learned this: Caching turned a site that could barely handle 100 concurrent users into one serving 10,000+ users with the same database. But bad caching (stale data, wrong cache keys) caused more bugs than anything else! Cache wisely! šÆ
You don't need perfect caching from day one - start with Redis for your hottest queries and add CDN for static assets. Measure, optimize, iterate! š
Your Caching Checklist ā
Before going to production:
- Redis cache for frequently accessed data
- CDN for static assets (images, CSS, JS)
- Appropriate Cache-Control headers
- Cache invalidation on data updates
- Reasonable TTLs based on data volatility
- Cache monitoring (hit rate, miss rate)
- Cache warming on deployment
- Tested cache invalidation scenarios
- Documented what's cached and why
Your Action Plan šÆ
This week:
- Identify your top 5 most-queried database queries
- Add Redis caching for those queries
- Set up CloudFront/Cloudflare for static assets
- Measure cache hit rate
This month:
- Implement cache invalidation strategy
- Add Cache-Control headers to all API responses
- Set up cache monitoring dashboard
- Optimize TTLs based on traffic patterns
This quarter:
- Implement multi-layer caching strategy
- Add cache warming on deployments
- Optimize CDN configuration for global users
- Reduce database load by 90%+
Resources Worth Your Time š
Tools I use daily:
- Redis - In-memory cache (my go-to!)
- CloudFront - AWS CDN
- Cloudflare - CDN + security
Reading:
Real talk: The best caching strategy is the one that solves YOUR bottlenecks. Measure first, cache strategically!
Building fast, global applications? Connect with me on LinkedIn and share your caching war stories!
Want to see my caching implementations? Check out my GitHub - real production Redis and CDN configs!
Now go forth and cache responsibly! šš¾
P.S. If your database is getting hammered with queries for data that never changes, add caching TODAY! I once reduced database load from 15,000 queries/sec to 500 queries/sec with 2 hours of Redis implementation. Best ROI ever! š
P.P.S. I once cached user shopping carts globally (same cache key for all users). User A added an iPhone to their cart, User B saw it in THEIR cart and bought it! We shipped it to User B. Lost $1,200 + shipping. Always include unique identifiers in cache keys! ALWAYS! š±šø