0x55aa
Back to Blog

Connection Pool Tuning: Stop Letting Your Database Breathe Through a Straw 🏊

|
6 min read

Picture your database as an Olympic swimming pool with 200 lanes. You've got a line of users out the door, all wanting to swim. Now imagine your application is the one handing out lane assignments β€” and it only ever hands out three at a time, making everyone else stand in line getting progressively more furious.

That's what an under-tuned connection pool looks like. And I've seen it take down perfectly healthy services at Cubet more times than I'd like to admit.

What Is a Connection Pool, Really?

Creating a database connection is expensive. We're talking TCP handshakes, TLS negotiation, authentication, and session setup β€” easily 50–200ms per cold connection. If every query created a fresh connection and tore it down afterward, your "fast" API endpoint would spend more time saying hello to Postgres than actually fetching data.

A connection pool pre-creates a bunch of connections and keeps them warm. Queries borrow a connection, do their thing, and return it to the pool. Fast, efficient, sensible.

The problem? Most apps ship with default pool sizes and never revisit them. And defaults are… not tuned for your production traffic at 3am when a marketing campaign just dropped.

The Three Numbers That Matter

Every pool implementation gives you some variant of these knobs:

  • min / min_size β€” connections to keep alive even at idle
  • max / pool_size β€” hard ceiling on simultaneous connections
  • acquireTimeout / connectionTimeout β€” how long a query waits for a free connection before throwing an error

Here's a typical default you'll find in a fresh pg or knex setup:

// knex config β€” what most tutorials show you
const db = knex({
  client: 'postgresql',
  connection: process.env.DATABASE_URL,
  pool: {
    min: 2,
    max: 10,  // πŸ‘ˆ 10 connections total. For everything. Forever.
  }
});

Ten connections. For your entire Node.js process. When Black Friday traffic hits and you have 200 concurrent requests all waiting on that pool, 190 of them are standing in line while your acquireTimeout clock ticks down. Then: TimeoutError: Knex: Timeout acquiring a connection. The pool is probably full.

Congrats, you have a traffic jam at the database door.

Finding Your Real Ceiling

Before cranking max to 1000 and calling it a day β€” your database has a limit too. Postgres, for instance, defaults to max_connections = 100. If your app opens 200 connections, Postgres laughs and starts rejecting them.

The formula I use is roughly:

max_pool_per_process = floor((db_max_connections - reserved_connections) / num_app_instances)

So if Postgres allows 100 connections, you reserve ~10 for admin/monitoring, and you're running 4 Node.js pods:

max = floor((100 - 10) / 4) = 22 connections per pod

That's your ceiling. But don't just set it and forget it β€” the right number is usually lower, because gorging on connections doesn't mean you're using them efficiently.

What "Good" Looks Like in Practice

Here's a config I've used for a moderately loaded API (a few hundred req/s at peak) backed by Postgres:

const db = knex({
  client: 'postgresql',
  connection: process.env.DATABASE_URL,
  pool: {
    min: 5,           // keep 5 warm at all times
    max: 25,          // don't exceed this
    acquireTimeoutMillis: 3000,   // fail fast β€” don't let requests queue forever
    createTimeoutMillis: 3000,
    idleTimeoutMillis: 30000,     // reclaim idle connections after 30s
    reapIntervalMillis: 1000,
    propagateCreateError: false,  // don't crash the pool on one bad connect
  }
});

Key insight: acquireTimeoutMillis: 3000 is a feature, not a bug. If your pool is full, you want requests to fail fast with a meaningful error rather than stack up indefinitely, exhaust memory, and drag your entire service down with them.

A slow timeout is just a delayed crash with more collateral damage.

The PgBouncer Wildcard

If you're running many app instances (containers, lambdas, anything that scales horizontally) and they're all holding database connections, you'll blow past Postgres's max_connections ceiling before you know it.

Enter PgBouncer β€” a lightweight connection pooler that sits between your app and Postgres. Your app thinks it's talking to Postgres directly, but PgBouncer is actually multiplexing hundreds of application connections down to a small number of real Postgres connections.

At Cubet, we dropped our Postgres connection count from ~350 (with 15 containers Γ— 25 pool size = way too many) to under 40 by routing through PgBouncer in transaction pooling mode. Response time at p99 went from "I need a coffee" to "barely noticeable."

The tradeoff: transaction mode doesn't support prepared statements or SET commands that persist across queries β€” check your ORM's docs before enabling it blindly.

Monitoring: The Part Everyone Skips

Tuning blind is guesswork. At minimum, track:

  • Pool utilization β€” what percentage of max connections are in use right now
  • Queue depth β€” how many requests are waiting for a connection
  • Acquire time β€” average time a query spends waiting for a connection (should be < 10ms in steady state)

In Node.js with knex, you can hook into pool events:

db.client.pool.on('acquireRequest', () => metrics.increment('db.pool.acquire_request'));
db.client.pool.on('acquireSuccess', (eventId, resource) => {
  metrics.gauge('db.pool.size', db.client.pool.numUsed());
});
db.client.pool.on('acquireFail', () => {
  metrics.increment('db.pool.acquire_fail');
  logger.warn('Connection pool exhausted');
});

Wire these into Datadog, Prometheus, or whatever you use. When acquire_fail starts spiking, you'll know before your users do.

Quick Checklist Before Your Next Load Test

  • Check your DB's max_connections and budget accordingly
  • Set acquireTimeout to something sane (2–5s, not 30s)
  • Use idleTimeout to reclaim unused connections
  • If scaling horizontally, put PgBouncer or RDS Proxy in front
  • Add pool metrics to your observability stack
  • Load test with realistic concurrency β€” not just ab -n 100 -c 1

TL;DR

Connection pools are the unsung performance lever most backend engineers never touch after the initial setup. The defaults are conservative at best, dangerously mistuned at worst. Know your database's connection ceiling, budget across your instances, fail fast on timeouts, and actually observe what your pool is doing.

Your database has plenty of capacity. Stop making it breathe through a straw.


What's your go-to pool size for Postgres-backed Node apps? Curious how it varies across team setups β€” drop a comment or find me on X/Twitter.

Thanks for reading!

Back to all posts