π Node.js Streams: Stop Loading Everything Into Memory
Imagine you ordered a pizza. Now imagine your pizza delivery person couldn't hand it to you until they baked every pizza in the entire restaurant first. That's what most Node.js code does with data β load it all into memory before touching a single byte.
Streams say: why wait? Just start eating the first slice while the rest is still cooking.
What Even Is a Stream?
A Node.js stream is an abstraction for working with data piece by piece instead of all at once. There are four types:
- Readable β data flows out (like reading a file)
- Writable β data flows in (like writing to a file)
- Duplex β both directions (like a TCP socket)
- Transform β reads, transforms, and writes (like gzip compression)
The killer feature? You process data as it arrives instead of waiting for the whole thing. This means lower memory usage, faster time-to-first-byte, and servers that don't keel over when someone uploads a 2GB CSV.
The Problem With "Just readFileSync It"
Here's the classic mistake:
const express = require('express');
const fs = require('fs');
const app = express();
// β The memory hog approach
app.get('/download-report', (req, res) => {
const fileContents = fs.readFileSync('./big-report.csv'); // loads ENTIRE file into RAM
res.send(fileContents);
});
For a 500MB CSV, this allocates 500MB of RAM per request. Five concurrent users? That's 2.5GB gone. Your server starts sweating, the GC starts crying, and eventually Node.js throws a JavaScript heap out of memory error at 3 AM while you're asleep.
The Stream Fix: Data Flows Like Water
const express = require('express');
const fs = require('fs');
const app = express();
// β
The stream approach
app.get('/download-report', (req, res) => {
res.setHeader('Content-Type', 'text/csv');
res.setHeader('Content-Disposition', 'attachment; filename="report.csv"');
const fileStream = fs.createReadStream('./big-report.csv');
fileStream.pipe(res); // chunks flow directly to the HTTP response
fileStream.on('error', (err) => {
console.error('Stream error:', err);
res.status(500).end('Something went wrong');
});
});
app.listen(3000);
Same result for the user. Fraction of the memory. The file chunks flow straight from disk β stream buffer β HTTP response without ever fully materializing in RAM. Beautiful.
Pipeline: The Grown-Up Way to Chain Streams
Piping streams manually is fine until one stream errors and the others keep running, leaking memory like a broken faucet. Enter stream.pipeline:
const { pipeline } = require('stream/promises');
const fs = require('fs');
const zlib = require('zlib');
const express = require('express');
const app = express();
// Stream a file AND gzip it on the fly
app.get('/download-compressed', async (req, res) => {
res.setHeader('Content-Type', 'application/gzip');
res.setHeader('Content-Disposition', 'attachment; filename="report.csv.gz"');
try {
await pipeline(
fs.createReadStream('./big-report.csv'), // read from disk
zlib.createGzip(), // compress on the fly
res // send to client
);
} catch (err) {
// pipeline automatically cleans up all streams on error
if (!res.headersSent) {
res.status(500).end('Download failed');
}
}
});
This compresses a massive CSV while streaming it β no temp files, no full decompressed copy in memory. The data flows through each stage like an assembly line. The client starts receiving compressed bytes almost immediately, not after the whole thing is compressed.
stream.pipeline also automatically destroys all streams in the chain if any one of them errors. No leaks. No zombie streams haunting your event loop.
Real-World Use Cases
Uploading files to S3 without touching disk: Multer can stream directly into an S3 upload stream. The file goes: user's browser β your Express server's stream buffer β S3. Your server is just a pipe, barely touching the data.
Processing large database exports:
ORM returning 100k rows? Most ORMs support query streaming. Instead of .findAll() (which buffers all rows), use a cursor-based stream and process row-by-row.
Log parsing:
Got a 1GB log file to analyze? readline + a createReadStream lets you process it line-by-line in ~20MB of RAM instead of 1GB.
Proxying API responses: When your backend calls an upstream API and returns the result, don't buffer it β pipe the upstream response directly to your client response.
The Mental Model That Makes It Click
Think of streams like a garden hose, not a bucket.
- Bucket approach: Fill the whole bucket (RAM), then carry it to the destination.
- Hose approach: Water flows continuously from source to destination. The hose just guides it.
Your server should be a hose, not a bucket. The moment you find yourself doing const everything = await readEverything() before processing, ask: could this be a stream?
When NOT to Use Streams
Streams aren't always the answer. If the data is small (< 1MB), the added complexity isn't worth it. If you genuinely need the full dataset in memory to process it (like sorting), streaming doesn't help. And if you're building a simple CRUD API that just returns a JSON object, res.json() is perfectly fine.
Use streams when:
- Files are large or unbounded in size
- You can process data incrementally
- You're proxying or transforming data between two I/O sources
- Memory efficiency is critical
Wrap Up
Node.js streams are one of those features that feel intimidating until the day they save your server from melting under a large file request. The mental shift is simple: favor flowing data over buffered data wherever the operation allows it.
Start small β swap one readFileSync for createReadStream and pipe it. Feel the difference. Then you'll start seeing streams everywhere.
Got a gnarly stream use case or a question? Drop it in the comments β or just pipe it my way. π