0x55aa
← Back to Blog

Polyglot Persistence: Stop Using One Database for Everything πŸ—„οΈπŸ”€

β€’11 min read

Polyglot Persistence: Stop Using One Database for Everything πŸ—„οΈπŸ”€

Hot take: Using one database for everything is like using a hammer for every tool in your shed β€” eventually you're trying to screw in a lightbulb with it and wondering why the ceiling's collapsing.

I did this. For years. PostgreSQL was my one true love, my hammer, my answer to every data problem. Sessions? PostgreSQL. Product search? PostgreSQL. Real-time leaderboards? PostgreSQL. Binary file metadata AND the files themselves? You guessed it β€” PostgreSQL.

Our DBA's eye twitched every time he saw my schema. He was right to be worried.

The "One Database to Rule Them All" Problem πŸ’€

When designing our e-commerce backend, I built everything on PostgreSQL because it's solid, reliable, and I knew it well. Here's what it looked like after two years:

PostgreSQL doing ALL of this:
β”œβ”€β”€ users (500K rows) β€” βœ… Makes sense
β”œβ”€β”€ orders (2M rows) β€” βœ… Makes sense
β”œβ”€β”€ products (50K rows) β€” βœ… Makes sense
β”œβ”€β”€ sessions (10M rows, expires every 30min) β€” πŸ€” Maybe?
β”œβ”€β”€ product_search_index (full-text GIN indexes everywhere) β€” 😰
β”œβ”€β”€ leaderboard_scores (recalculated every 5min via cron) β€” 😬
β”œβ”€β”€ product_images (BYTEA columns, 2GB per row avg) β€” 😱
β”œβ”€β”€ activity_feed_events (50M rows, 95% never queried) β€” πŸ’€
└── user_recommendations (recalculated ML scores) β€” ☠️

The production symptoms:

  • Product search: 800ms average (should be <50ms)
  • Session lookup: 120ms (should be <5ms)
  • Database CPU: 87% during peak
  • My Slack at 3am: "ORDERS ARE DOWN" 😭

The database wasn't the problem. I was asking ONE tool to do jobs it was never designed for.

What is Polyglot Persistence? πŸ€”

Simple idea: use different databases for different data access patterns.

Instead of this:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         PostgreSQL              β”‚
β”‚  Users + Orders + Sessions +    β”‚
β”‚  Search + Files + Cache + Feed  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Do this:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PostgreSQL  β”‚  β”‚    Redis    β”‚  β”‚Elasticsearch β”‚
β”‚  Users       β”‚  β”‚  Sessions   β”‚  β”‚ Product      β”‚
β”‚  Orders      β”‚  β”‚  Cache      β”‚  β”‚   Search     β”‚
β”‚  Products    β”‚  β”‚  Leaderboardβ”‚  β”‚              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚   AWS S3     β”‚  β”‚  DynamoDB    β”‚
       β”‚  Images      β”‚  β”‚  Activity    β”‚
       β”‚  Documents   β”‚  β”‚    Feed      β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each database does ONE thing really well. You stop fighting the tool.

The Databases in My Production Stack πŸ› οΈ

PostgreSQL: The Transactional Workhorse πŸ’ͺ

Best for: Relational data, complex queries, strong consistency

-- This is what PostgreSQL is BUILT for
SELECT
  o.id, o.total, u.email,
  COUNT(oi.id) as item_count
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.created_at > NOW() - INTERVAL '30 days'
  AND o.status = 'pending'
GROUP BY o.id, u.email
HAVING COUNT(oi.id) > 3;

What I keep in PostgreSQL:

  • Users, orders, products, payments
  • Anything requiring ACID transactions
  • Data with complex relationships
  • Anything you need to audit

As a Technical Lead, I've learned: When you need a JOIN across three tables with a transaction rollback, PostgreSQL. Don't argue. Don't look at DynamoDB. Just PostgreSQL.

Redis: The Speed Demon ⚑

Best for: Sessions, caching, leaderboards, pub/sub, rate limiting

Moving sessions from PostgreSQL to Redis was the single biggest win we got in 6 months:

// Before: PostgreSQL session lookup (120ms 😭)
const session = await db.query(
  'SELECT * FROM sessions WHERE token = $1 AND expires_at > NOW()',
  [req.headers.authorization]
);

// After: Redis session lookup (2ms πŸš€)
const session = await redis.get(`session:${req.headers.authorization}`);
const parsed = session ? JSON.parse(session) : null;

Real-time leaderboard that PostgreSQL was choking on:

// Redis Sorted Set β€” designed EXACTLY for this
async function updateScore(userId, score) {
  await redis.zadd('leaderboard:weekly', score, userId);
}

async function getTopPlayers(count = 10) {
  // This is O(log N + M) instead of O(N log N) SQL sort
  return await redis.zrevrangebyscore(
    'leaderboard:weekly', '+inf', '-inf',
    'WITHSCORES', 'LIMIT', 0, count
  );
}

async function getUserRank(userId) {
  return await redis.zrevrank('leaderboard:weekly', userId);
}

A scalability lesson that cost us: We tried building the leaderboard with PostgreSQL window functions. It worked fine at 1K users. At 500K users, the query took 18 seconds. Redis sorted sets give you this in 1ms regardless of size. Some data structures just belong in Redis.

What I put in Redis:

  • Sessions (TTL = 24h, auto-expires)
  • API response cache (TTL = 5min)
  • Rate limiting counters
  • Leaderboards / sorted rankings
  • Pub/sub for real-time notifications
  • Feature flag evaluations

Elasticsearch: The Search Specialist πŸ”

Best for: Full-text search, faceted filtering, fuzzy matching

Product search in PostgreSQL with ILIKE '%laptop%' is embarrassing in production:

-- PostgreSQL full-text search (800ms on 50K products 😱)
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description)
  @@ plainto_tsquery('wireless laptop keyboard')
ORDER BY ts_rank(...) DESC;
-- Also: can't do "Did you mean?" β€” can't rank by popularity AND relevance

After moving to Elasticsearch:

// Elasticsearch: 15ms on 50K products, with facets, typo correction, highlighting
const results = await esClient.search({
  index: 'products',
  body: {
    query: {
      multi_match: {
        query: 'wirelss laptop keyboard', // typo intentional
        fields: ['name^3', 'description', 'tags'],
        fuzziness: 'AUTO'  // "Did you mean: wireless?"
      }
    },
    aggs: {
      brands: { terms: { field: 'brand.keyword' } },
      price_ranges: {
        range: {
          field: 'price',
          ranges: [
            { to: 50 },
            { from: 50, to: 200 },
            { from: 200 }
          ]
        }
      }
    },
    highlight: {
      fields: { name: {}, description: {} }
    }
  }
});

When designing our e-commerce backend, search was our highest-traffic endpoint and the slowest. Moving to Elasticsearch dropped p99 search latency from 800ms to 20ms. Conversion rates on search results went up 31% because results were actually relevant.

What I use Elasticsearch for:

  • Product/content search
  • Log aggregation (ELK stack)
  • Autocomplete suggestions
  • Geospatial queries

AWS S3: The Infinite Shelf πŸ“¦

Best for: Files, images, documents, backups, anything binary

This is embarrassing to admit, but early on we stored product images as BYTEA in PostgreSQL:

-- DO NOT DO THIS. I repeat: DO NOT DO THIS.
CREATE TABLE product_images (
  id UUID PRIMARY KEY,
  product_id UUID REFERENCES products(id),
  image_data BYTEA NOT NULL,  -- ← 2MB per row, 50K products = 100GB in your DB!
  mime_type VARCHAR(50)
);

Our PostgreSQL backup was 140GB because of images. The database was slower than a sleepy sloth.

// The right way: PostgreSQL stores the REFERENCE, S3 stores the FILE
const uploadImage = async (productId, imageBuffer, mimeType) => {
  const key = `products/${productId}/${uuid()}.jpg`;

  await s3.putObject({
    Bucket: 'my-product-images',
    Key: key,
    Body: imageBuffer,
    ContentType: mimeType,
    CacheControl: 'max-age=31536000'  // 1 year cache
  }).promise();

  // Store ONLY the reference in PostgreSQL
  await db.query(
    'INSERT INTO product_images (product_id, s3_key) VALUES ($1, $2)',
    [productId, key]
  );

  return `https://cdn.mystore.com/${key}`;
};

Database backup dropped from 140GB β†’ 2GB. My DevOps guy almost cried tears of joy.

DynamoDB: The Serverless Scale Monster πŸš€

Best for: High-throughput key-value lookups, time-series data, event logs

Our activity feed was killing us. 50 million events in PostgreSQL, most never queried:

// DynamoDB: designed for time-series at scale
const ActivityFeed = {
  async logEvent(userId, eventType, metadata) {
    await dynamodb.put({
      TableName: 'activity-feed',
      Item: {
        PK: `USER#${userId}`,
        SK: `EVENT#${Date.now()}#${uuid()}`,
        eventType,
        metadata,
        ttl: Math.floor(Date.now() / 1000) + (90 * 24 * 60 * 60) // 90 day auto-expire!
      }
    }).promise();
  },

  async getUserFeed(userId, limit = 20) {
    const result = await dynamodb.query({
      TableName: 'activity-feed',
      KeyConditionExpression: 'PK = :pk AND begins_with(SK, :sk)',
      ExpressionAttributeValues: {
        ':pk': `USER#${userId}`,
        ':sk': 'EVENT#'
      },
      ScanIndexForward: false,  // newest first
      Limit: limit
    }).promise();

    return result.Items;
  }
};

50M rows in PostgreSQL = 4 seconds per user feed. 50M items in DynamoDB = 8ms per user feed, auto-scales to 1M reads/sec.

The Integration Pattern: How It All Fits Together πŸ”§

The tricky part of polyglot persistence is keeping multiple databases in sync. Here's how I handle it:

// When a product is created, write to multiple stores
async function createProduct(productData) {
  // 1. Source of truth: PostgreSQL (transactional)
  const product = await db.transaction(async (trx) => {
    const [created] = await trx('products').insert(productData).returning('*');
    return created;
  });

  // 2. Search index: Elasticsearch (eventually consistent, async)
  await esClient.index({
    index: 'products',
    id: product.id,
    body: {
      name: product.name,
      description: product.description,
      price: product.price,
      brand: product.brand,
      tags: product.tags
    }
  });

  // 3. Cache invalidation: Redis
  await redis.del(`product:${product.id}`);
  await redis.del('products:featured');  // Invalidate listing caches

  return product;
}

// Cache-aside pattern for product reads
async function getProduct(productId) {
  // Check cache first
  const cached = await redis.get(`product:${productId}`);
  if (cached) return JSON.parse(cached);

  // Cache miss: hit PostgreSQL
  const product = await db('products').where({ id: productId }).first();
  if (!product) return null;

  // Populate cache (5 min TTL)
  await redis.setex(`product:${productId}`, 300, JSON.stringify(product));
  return product;
}

The pattern: PostgreSQL is the source of truth. Everything else is a derived view optimized for specific access patterns.

Common Polyglot Mistakes I Made πŸͺ€

Mistake #1: Out-of-Sync Data

// BAD: No handling for partial failure
async function updateProduct(id, data) {
  await db('products').where({ id }).update(data);     // βœ… Succeeds
  await esClient.update({ index: 'products', id, ... }); // πŸ’₯ Elasticsearch is down!
  // Now your database has new name but search has old name. Users are confused!
}

// GOOD: Use the outbox pattern or event-driven sync
async function updateProduct(id, data) {
  await db.transaction(async (trx) => {
    await trx('products').where({ id }).update(data);

    // Write sync event to outbox (same transaction!)
    await trx('outbox').insert({
      aggregate_type: 'product',
      aggregate_id: id,
      event_type: 'product.updated',
      payload: JSON.stringify(data)
    });
  });

  // Background worker reads outbox and syncs Elasticsearch
  // Even if ES is down, it'll sync when it recovers
}

Mistake #2: Treating Every Database as Primary

// BAD: "Let's just query Elasticsearch for the checkout!"
const product = await esClient.get({ index: 'products', id: productId });
const price = product._source.price;  // This price might be 5 minutes stale!
// Customer gets charged wrong amount. Refunds are painful. 😭

// GOOD: PostgreSQL for anything financial or authoritative
const product = await db('products').where({ id: productId }).first();
const price = product.price;  // This is the real price, always.

The rule: Search/cache are for FINDING and DISPLAYING. PostgreSQL is for TRANSACTING.

Mistake #3: Not Planning for Consistency Windows

// User creates a product, immediately searches for it β€” it's NOT there yet!
const newProduct = await createProduct(productData);

// Elasticsearch indexing is async - can take 1-2 seconds
const searchResults = await searchProducts(productData.name);
// searchResults will NOT include newProduct yet!

// Solution: After creating, tell the user it'll appear in search "shortly"
// Or: Do a direct DB lookup for the newly-created item instead of search
return {
  product: newProduct,
  message: "Product created! It'll appear in search results within 30 seconds."
};

When Does This Make Sense? πŸ“

Don't add polyglot complexity to small apps:

Startup / MVP:
- 1 database (PostgreSQL or MySQL)
- No Redis, no Elasticsearch
- Simple is maintainable
- You can always add later!

Growing app (>100K users, noticeable latency):
- Add Redis for sessions and caching
- Still one relational DB

Scaling app (>1M users, search is slow, files are bloating):
- Add Elasticsearch for search
- Add S3 for file storage
- Redis already in place

High-traffic platform:
- Full polyglot: PostgreSQL + Redis + Elasticsearch + S3 + DynamoDB
- Now you need solid data sync strategy!

When I refactored our e-commerce backend from single PostgreSQL to polyglot:

  • Sessions: 120ms β†’ 2ms
  • Product search: 800ms β†’ 18ms
  • Database backup size: 140GB β†’ 2GB
  • Database CPU at peak: 87% β†’ 31%
  • My 3am alerts: Went from weekly to... rare

The TL;DR πŸ’‘

Polyglot persistence isn't about using every database in existence β€” it's about matching the tool to the job:

Data Type Best Tool Why
Transactions, relationships PostgreSQL / MySQL ACID, joins, constraints
Sessions, cache, counters Redis Sub-millisecond, TTL, atomic ops
Full-text, fuzzy search Elasticsearch Ranking, facets, relevance
Files, images, documents S3 Cheap, infinite, CDN-ready
High-throughput key-value DynamoDB Auto-scaling, no ops

Start simple. One database is fine until it's not. Add specialized stores only when you have a REAL performance problem to solve, not because it sounds cool in architecture diagrams.

And for the love of all that is holy β€” don't store 2GB images in PostgreSQL BYTEA columns. I've been there. It haunts me.


Want to talk architecture decisions? Find me on LinkedIn β€” always up for debating PostgreSQL vs "just use Redis for everything".

See it in action? My GitHub has examples of multi-database setups from real projects.

Use the right tool. Your DBA (and your 3am alerts) will thank you. πŸ—„οΈπŸ”€


P.S. If you're still storing sessions in your primary relational database, please stop reading this and go add Redis right now. I'll wait. πŸ™

P.P.S. "But PostgreSQL can do full-text search!" Yes, and a Swiss Army knife has scissors. That doesn't mean you should cut fabric with it.