🛑 Node.js Graceful Shutdown: Don't Just Kill It
Picture this: your CI pipeline just pushed a new version of your app. Kubernetes sends a SIGTERM to the old pod. Your app — like a toddler who doesn't understand "we're leaving now" — immediately drops everything and shuts down. Three users get 503 errors. One of them was mid-checkout. You get a Slack message at 2am.
This is the ungraceful shutdown experience. And it's shockingly common.
Today we fix that.
What Even Is a Graceful Shutdown?
A graceful shutdown is the difference between a surgeon finishing the stitch before leaving the OR, and just... walking out mid-operation because their shift technically ended.
When your server receives a shutdown signal (SIGTERM, SIGINT), the graceful approach is:
- Stop accepting new connections — no new patients in the waiting room
- Finish in-flight requests — complete what's already started
- Close external connections — database, cache, message queues
- Exit cleanly — with code
0, like a professional
Sounds obvious. Most apps don't do it.
The Naive App (aka The Problem)
Here's what most Express apps look like when they die:
const express = require('express');
const app = express();
app.get('/checkout', async (req, res) => {
await processPayment(req.body); // 🔥 This might get cut off
await sendConfirmationEmail(); // 🔥 So might this
res.json({ success: true });
});
app.listen(3000, () => console.log('Server running on port 3000'));
// No shutdown handling. Just vibes.
When SIGTERM hits, Node exits. If processPayment was halfway done, it's just... abandoned. The charge might go through. The email might not. The customer is confused. Your refund queue grows.
The Graceful Version
Here's how to actually handle this:
const express = require('express');
const app = express();
app.get('/checkout', async (req, res) => {
await processPayment(req.body);
await sendConfirmationEmail();
res.json({ success: true });
});
const server = app.listen(3000, () => {
console.log('Server running on port 3000');
});
// ---- The important bit ----
let isShuttingDown = false;
// Reject new requests during shutdown
app.use((req, res, next) => {
if (isShuttingDown) {
res.setHeader('Connection', 'close');
return res.status(503).json({ error: 'Server is shutting down' });
}
next();
});
async function gracefulShutdown(signal) {
console.log(`\nReceived ${signal}. Starting graceful shutdown...`);
isShuttingDown = true;
// Stop accepting new connections
server.close(async () => {
console.log('HTTP server closed. Cleaning up...');
try {
await db.end(); // close DB pool
await redisClient.quit(); // close Redis
console.log('All connections closed. Exiting.');
process.exit(0);
} catch (err) {
console.error('Error during cleanup:', err);
process.exit(1);
}
});
// Force-exit if cleanup takes too long (safety net)
setTimeout(() => {
console.error('Graceful shutdown timed out. Force exiting.');
process.exit(1);
}, 10_000); // 10 seconds
}
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));
The key moves here:
isShuttingDownflag — a bouncer at the door who stops letting new people inserver.close()— waits for active connections to finish before calling the callback- Timeout fallback — if something is truly stuck, we still exit rather than hanging forever
- Cleanup order matters — close your app server before your database, not the other way around
The Kubernetes Angle
In Kubernetes, the lifecycle looks like this:
- Pod gets
SIGTERM - Kubernetes waits
terminationGracePeriodSeconds(default: 30s) - If the pod is still alive after 30s, it gets
SIGKILL(the nuclear option)
This means your app has up to 30 seconds to finish in-flight work. Use it wisely. Your timeout in the shutdown handler should be slightly less than terminationGracePeriodSeconds so Node exits cleanly before Kubernetes kills it forcefully.
# In your Kubernetes Deployment
spec:
terminationGracePeriodSeconds: 30
And in your Node.js app:
setTimeout(() => {
process.exit(1);
}, 25_000); // 5 seconds less than K8s grace period
That 5-second buffer is your insurance policy.
Common Pitfalls
Forgetting keep-alive connections — server.close() stops new connections but won't forcibly close existing keep-alive HTTP connections. Long-polling clients or SSE streams can hold the process open. You may need to track open sockets and destroy them manually, or use a library like http-terminator which handles this for you.
Not closing your DB pool — If you skip db.end(), the process might hang or leave dangling connections on your database server. Postgres especially gets cranky about this over time.
Logging after process.exit() — process.exit() is synchronous and immediate. Any async logging (like shipping to Datadog) after that line will be silently dropped. Flush your logs before calling exit.
Why This Actually Matters
Graceful shutdown isn't just about being polite to your users (though it is that). It's about:
- Zero-downtime deploys — rolling updates only work if the old pod finishes its work before dying
- Data integrity — half-written database transactions are a nightmare to debug
- Cost — fewer failed transactions means fewer support tickets, refunds, and incidents
- Sleep quality — yours, specifically, at 2am
The code to do this right is maybe 30 lines. The cost of not doing it is... higher.
Wrapping Up
Your app receives SIGTERM dozens of times a week in a normal production environment. Every deploy, every scale-down event, every node rotation sends one. Most apps silently drop requests every single time and nobody notices until something important breaks.
Add graceful shutdown. It takes 20 minutes, and it's the kind of boring infrastructure work that makes you a hero when it matters.
Now go update your server.js — your future on-call self will thank you.
Is your production app handling shutdowns correctly? Drop your setup in the comments or ping me on GitHub — always happy to compare war stories.