0x55aa
Back to Blog

Graceful Shutdown in Node.js: Don't Just Kill Your Server 🎯

|
6 min read

Graceful Shutdown in Node.js: Don't Just Kill Your Server 🎯

Imagine you're a waiter mid-service — you've got three tables waiting for food, two bills to process, and a coffee machine actively brewing. Then your manager walks up and just... cuts the power. Chaos. Cold pasta. Angry customers. That's exactly what happens when you SIGKILL your Node.js server without a graceful shutdown strategy.

Most tutorials teach you how to start a server. Almost none teach you how to stop one properly. But in production — especially in containers, Kubernetes, or CI/CD pipelines — how your server dies matters just as much as how it lives.

What Even Is Graceful Shutdown?

When a process receives a shutdown signal (like SIGTERM from kill or Kubernetes scaling down a pod), it has two choices:

  1. Die immediately — drop all in-flight requests, close DB connections abruptly, corrupt any queued work. Fast, but brutal.
  2. Die gracefully — stop accepting new requests, finish the ones already in progress, clean up resources, then exit. Slower, but professional.

Option 2 is what separates production-ready code from "works on my machine" code.

The Signal Lifecycle

Node.js (and every Unix process) can listen for OS signals. The important ones:

  • SIGTERM — "Please shut down." This is the polite one. Kubernetes sends this before killing a pod. Docker sends this on docker stop. This is your graceful shutdown trigger.
  • SIGINT — Ctrl+C in your terminal. Same idea.
  • SIGKILL — "Die NOW." Unblockable. Can't listen for it. This is the nuclear option.

Your goal: handle SIGTERM so gracefully that SIGKILL is never needed.

The Basic Pattern

Here's the core structure of a graceful Express shutdown:

import express from 'express';
import { createServer } from 'http';

const app = express();
const server = createServer(app);

app.get('/health', (req, res) => res.json({ status: 'ok' }));

app.get('/slow', async (req, res) => {
  // Simulates a slow DB query or external API call
  await new Promise(resolve => setTimeout(resolve, 3000));
  res.json({ data: 'finally done' });
});

server.listen(3000, () => console.log('Server running on :3000'));

// --- Graceful Shutdown Logic ---
let isShuttingDown = false;

function shutdown(signal) {
  console.log(`Received ${signal}. Starting graceful shutdown...`);
  isShuttingDown = true;

  // Stop accepting new connections
  server.close(() => {
    console.log('All connections closed. Exiting.');
    process.exit(0);
  });

  // Forceful exit after timeout (safety net)
  setTimeout(() => {
    console.error('Shutdown timeout. Forcing exit.');
    process.exit(1);
  }, 10_000); // 10 seconds max
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

server.close() stops accepting new connections but waits for existing ones to finish. That's the magic. The timeout is your safety net — if something is truly stuck, you don't want the process hanging forever.

Rejecting New Requests During Shutdown

There's a subtle problem: server.close() stops new TCP connections, but HTTP keep-alive connections are reused. A client with an open keep-alive connection can still send new requests during your "shutdown window."

Fix this with middleware that rejects incoming requests once shutdown starts:

// Middleware: reject requests during shutdown
app.use((req, res, next) => {
  if (isShuttingDown) {
    res.setHeader('Connection', 'close');
    return res.status(503).json({
      error: 'Server is shutting down. Please retry.'
    });
  }
  next();
});

// Better server.close() with keep-alive drain
function shutdown(signal) {
  console.log(`Received ${signal}. Graceful shutdown initiated.`);
  isShuttingDown = true;

  server.close(async () => {
    try {
      // Clean up resources: DB pools, message queues, caches
      await db.pool.end();
      await redisClient.quit();
      console.log('Resources released. Goodbye!');
      process.exit(0);
    } catch (err) {
      console.error('Error during cleanup:', err);
      process.exit(1);
    }
  });

  setTimeout(() => {
    console.error('Forced shutdown after timeout.');
    process.exit(1);
  }, 10_000);
}

Now during shutdown: new requests get a 503 Service Unavailable, in-flight requests complete normally, and database connections close cleanly after the last query finishes.

The Kubernetes Reality Check

In Kubernetes, the shutdown sequence looks like this:

  1. Pod receives SIGTERM
  2. Pod is removed from the Service's endpoint list (takes a few seconds to propagate)
  3. After terminationGracePeriodSeconds (default: 30s), SIGKILL is sent

There's a race condition between steps 1 and 2. Your pod might get SIGTERM while the load balancer still routes traffic to it for a few more seconds. The fix? Add a brief delay before fully stopping:

function shutdown(signal) {
  console.log(`${signal} received. Waiting for load balancer to drain...`);
  isShuttingDown = true;

  // Give the LB 5 seconds to stop routing new traffic here
  setTimeout(() => {
    server.close(async () => {
      await cleanup();
      process.exit(0);
    });

    // Hard timeout
    setTimeout(() => process.exit(1), 25_000);
  }, 5_000);
}

That 5-second delay is counter-intuitive but battle-tested. It gives the upstream load balancer time to update its routing table before you stop accepting connections.

Common Mistakes to Avoid

Don't exit in uncaught exception handlers without cleanup:

// BAD — abrupt exit, no cleanup
process.on('uncaughtException', (err) => {
  console.error(err);
  process.exit(1); // DB connections left dangling
});

// BETTER — log, then graceful shutdown
process.on('uncaughtException', (err) => {
  console.error('Uncaught exception:', err);
  shutdown('uncaughtException');
});

Don't forget unhandled promise rejections — in Node.js 15+, they crash the process. Handle them explicitly and trigger your shutdown flow.

Don't set your timeout too low. If a DB query legitimately takes 8 seconds and your timeout is 5 seconds, you'll force-exit mid-query. Match the timeout to your slowest expected operation plus a buffer.

Why This Matters More Than You Think

Graceful shutdown isn't a "nice to have" — it's what makes zero-downtime deployments possible. Without it, every deploy you roll out will drop some percentage of in-flight requests. That might be 0.1% of traffic, which sounds fine until you realize that's thousands of failed requests per deployment in a high-traffic app.

In serverless or container environments, processes start and stop constantly. Your shutdown story is as important as your startup story.

The Takeaway

Three things your production Node.js app needs for graceful shutdown:

  1. Listen for SIGTERM and SIGINT — these are your graceful shutdown triggers.
  2. Use server.close() + middleware to stop accepting new work while finishing existing work.
  3. Always have a hard timeout — 10-30 seconds — so a stuck request doesn't prevent the process from ever exiting.

Your server worked hard serving all those requests. The least you can do is give it a proper send-off.


What's the messiest production incident you've had from a bad shutdown story? Drop it in the comments — bonus points if it happened during a Friday deploy. 👇

Thanks for reading!

Back to all posts