Node.js Streams: Stop Loading Everything Into Memory ๐ง
Node.js Streams: Stop Loading Everything Into Memory ๐ง
Real confession: The first file upload feature I built at Acodez brought down our entire Node.js server. The file? A 500MB CSV. My approach? fs.readFile() to load the ENTIRE file into memory. Result? "JavaScript heap out of memory" error. Production down. Boss not impressed! ๐ฑ
When I was building Node.js APIs, I thought "just read the file and process it" was fine. Coming from Laravel where you can buffer things in PHP without much thought (multiple processes save you), Node.js taught me a painful lesson: One massive memory spike can kill your ENTIRE single-threaded server!
Let me save you from the 2 AM debugging sessions I had!
What Even Are Streams? ๐
Streams = Processing data in chunks instead of loading everything at once.
Think of it like drinking water:
- Without streams (readFile): Dump entire gallon into your mouth at once โ You choke!
- With streams: Sip through a straw โ Comfortable, manageable, civilized!
The magic: Process 10GB files with only 64KB in memory. Your server stays alive! ๐
The reality: Streams have a learning curve. But once you get it? Never going back!
The Memory Disaster (How I Learned) ๐ฅ
My "brilliant" CSV processor at Acodez:
// DON'T DO THIS!
app.post('/api/upload', async (req, res) => {
const file = req.file; // multer uploaded file
// Load ENTIRE file into memory!
const data = await fs.readFile(file.path, 'utf8');
// Parse ENTIRE CSV into memory!
const rows = data.split('\n').map(row => row.split(','));
// Process ALL rows at once!
const results = rows.map(row => processRow(row));
res.json({ processed: results.length });
});
What happened in production:
# Small file (10MB): Works fine! โ
# Medium file (100MB): Slow but works โ ๏ธ
# Large file (500MB): Server memory spikes to 2GB
# Another request comes in...
# CRASH: "JavaScript heap out of memory" ๐
# All users disconnected
# Production down
# PagerDuty alert
# Boss: "We need to talk..."
Why it crashed:
- File loaded: 500MB in memory
- Split into rows: Another 500MB (now 1GB)
- Map operations: Creating new arrays (1.5GB)
- V8 heap limit (default: 1.4GB) exceeded
- BOOM! ๐ฅ
Coming from Laravel: In PHP-FPM, one request's memory doesn't affect others (separate processes). In Node.js? One giant file can crash EVERYTHING! Single threaded = shared fate!
The Stream Solution โจ
The same feature, done right:
const fs = require('fs');
const readline = require('readline');
app.post('/api/upload', async (req, res) => {
const file = req.file;
let processed = 0;
// Create read stream (only 64KB in memory at a time!)
const fileStream = fs.createReadStream(file.path);
// Process line by line
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
for await (const line of rl) {
const row = line.split(',');
await processRow(row); // Process one row at a time
processed++;
}
res.json({ processed });
});
Memory usage:
- Before (readFile): 500MB file โ 2GB memory spike
- After (streams): 500MB file โ 10MB memory (constant!)
- Difference: Server stays alive! ๐
Real impact: Went from crashing on 500MB files to handling 10GB files without breaking a sweat!
Types of Streams in Node.js ๐
Node.js has 4 types of streams:
1. Readable Streams (Read data FROM)
const fs = require('fs');
// Reading a file
const readStream = fs.createReadStream('bigfile.txt', {
highWaterMark: 64 * 1024 // 64KB chunks (default: 16KB)
});
readStream.on('data', (chunk) => {
console.log(`Received ${chunk.length} bytes`);
// Process chunk here (only this chunk in memory!)
});
readStream.on('end', () => {
console.log('File completely read!');
});
readStream.on('error', (error) => {
console.error('Error reading file:', error);
});
Use cases:
- Reading large files
- HTTP request bodies
- Database query results
- WebSocket messages
2. Writable Streams (Write data TO)
const writeStream = fs.createWriteStream('output.txt');
// Write data chunk by chunk
writeStream.write('First chunk\n');
writeStream.write('Second chunk\n');
writeStream.write('Third chunk\n');
// Must call end() when done!
writeStream.end('Final chunk\n');
writeStream.on('finish', () => {
console.log('All data written!');
});
writeStream.on('error', (error) => {
console.error('Error writing file:', error);
});
Use cases:
- Writing large files
- HTTP responses
- Database inserts
- Log files
3. Duplex Streams (Both read AND write)
const net = require('net');
// TCP socket is a duplex stream
const socket = net.connect({ port: 8080 });
// You can read from it
socket.on('data', (data) => {
console.log('Received:', data.toString());
});
// AND write to it
socket.write('Hello server!');
Use cases:
- TCP sockets
- Crypto operations
- Data transformations
4. Transform Streams (Modify data as it flows)
const { Transform } = require('stream');
// Custom transform: uppercase everything
const uppercaseTransform = new Transform({
transform(chunk, encoding, callback) {
const uppercased = chunk.toString().toUpperCase();
this.push(uppercased);
callback();
}
});
// Use it in a pipeline
fs.createReadStream('input.txt')
.pipe(uppercaseTransform)
.pipe(fs.createWriteStream('output.txt'));
Use cases:
- Data compression
- Encryption/decryption
- Data validation
- Format conversion
Real-World Stream Patterns I Use ๐ฏ
Pattern #1: File Upload & Processing
const multer = require('multer');
const csv = require('csv-parser');
const upload = multer({ dest: 'uploads/' });
app.post('/api/import-users', upload.single('file'), async (req, res) => {
const users = [];
let errors = [];
try {
await new Promise((resolve, reject) => {
fs.createReadStream(req.file.path)
.pipe(csv())
.on('data', (row) => {
// Validate and transform each row
if (isValidUser(row)) {
users.push(row);
} else {
errors.push(`Invalid row: ${JSON.stringify(row)}`);
}
})
.on('end', resolve)
.on('error', reject);
});
// Now insert users in batches
await User.bulkCreate(users);
res.json({
imported: users.length,
errors: errors.length
});
} catch (error) {
res.status(500).json({ error: error.message });
} finally {
// Clean up uploaded file
fs.unlink(req.file.path, () => {});
}
});
Why it works:
- Processes CSV line by line
- Memory stays constant regardless of file size
- Validates as it goes
- Handles errors gracefully
Pattern #2: Streaming HTTP Responses
// DON'T DO THIS (loads entire result into memory!)
app.get('/api/export', async (req, res) => {
const users = await User.findAll(); // Could be 100k users!
const csv = convertToCSV(users); // All in memory!
res.send(csv); // Could be 500MB!
});
// DO THIS (streams the response!)
app.get('/api/export', async (req, res) => {
res.setHeader('Content-Type', 'text/csv');
res.setHeader('Content-Disposition', 'attachment; filename=users.csv');
// Stream header
res.write('id,name,email\n');
// Stream users in batches
const batchSize = 1000;
let offset = 0;
while (true) {
const users = await User.findAll({
limit: batchSize,
offset: offset,
raw: true
});
if (users.length === 0) break;
for (const user of users) {
res.write(`${user.id},${user.name},${user.email}\n`);
}
offset += batchSize;
}
res.end();
});
Real impact: Exported 500k users from database without killing server! Memory stayed under 50MB!
Pattern #3: Piping Streams (The Elegant Way)
const zlib = require('zlib');
// Bad way (manual event handling)
app.get('/download', (req, res) => {
const readStream = fs.createReadStream('bigfile.txt');
const gzip = zlib.createGzip();
readStream.on('data', (chunk) => {
gzip.write(chunk);
});
readStream.on('end', () => {
gzip.end();
});
gzip.on('data', (chunk) => {
res.write(chunk);
});
gzip.on('end', () => {
res.end();
});
});
// Good way (pipe does it all!)
app.get('/download', (req, res) => {
res.setHeader('Content-Encoding', 'gzip');
fs.createReadStream('bigfile.txt')
.pipe(zlib.createGzip())
.pipe(res);
// One line! Handles backpressure automatically!
});
Why .pipe() is magic:
- Handles backpressure (slow consumer)
- Automatic error propagation
- Cleaner code
- Pauses/resumes automatically
A pattern I use for production:
const { pipeline } = require('stream');
const { promisify } = require('util');
const pipelineAsync = promisify(pipeline);
app.get('/process-and-download', async (req, res) => {
try {
res.setHeader('Content-Type', 'application/json');
await pipelineAsync(
fs.createReadStream('input.json'),
transformData(), // Your custom transform
validateData(), // Another transform
zlib.createGzip(),
res
);
} catch (error) {
console.error('Pipeline error:', error);
if (!res.headersSent) {
res.status(500).json({ error: 'Processing failed' });
}
}
});
Why pipeline() is better:
- Properly cleans up streams on error
- Calls callback when done OR error
- Better error handling than .pipe()
Pattern #4: Backpressure Handling
The problem: Producer faster than consumer = memory leak!
// BAD: Ignores backpressure
const writeStream = fs.createWriteStream('output.txt');
for (let i = 0; i < 1000000; i++) {
// This writes FASTER than disk can handle!
writeStream.write(`Line ${i}\n`);
// Result: Buffer grows to gigabytes!
}
GOOD: Respects backpressure:
const writeStream = fs.createWriteStream('output.txt');
async function writeMillionLines() {
for (let i = 0; i < 1000000; i++) {
const canContinue = writeStream.write(`Line ${i}\n`);
if (!canContinue) {
// Buffer is full! Wait for drain event
await new Promise(resolve => {
writeStream.once('drain', resolve);
});
}
}
writeStream.end();
}
writeMillionLines();
Translation: If write() returns false, WAIT for the drain event before writing more! This prevents memory from exploding!
Common Stream Mistakes (I Made All of These) ๐
Mistake #1: Forgetting to Handle Errors
// BAD: Errors crash your server!
fs.createReadStream('file.txt')
.pipe(res);
// File doesn't exist? Unhandled error! Server crash!
Fix:
const stream = fs.createReadStream('file.txt');
stream.on('error', (error) => {
console.error('Read error:', error);
if (!res.headersSent) {
res.status(500).json({ error: 'Failed to read file' });
}
});
stream.pipe(res);
Mistake #2: Not Cleaning Up Streams
// BAD: Memory leak! Stream never closes!
app.get('/download', (req, res) => {
const stream = fs.createReadStream('bigfile.txt');
stream.pipe(res);
// User closes browser mid-download
// Stream keeps reading! Memory leak!
});
Fix:
app.get('/download', (req, res) => {
const stream = fs.createReadStream('bigfile.txt');
// Clean up if client disconnects
req.on('close', () => {
stream.destroy();
});
stream.on('error', (error) => {
stream.destroy();
if (!res.headersSent) {
res.status(500).end();
}
});
stream.pipe(res);
});
Mistake #3: Using Async/Await Wrong with Streams
// BAD: This doesn't wait for stream to finish!
async function processFile() {
const stream = fs.createReadStream('file.txt');
stream.pipe(someTransform).pipe(fs.createWriteStream('output.txt'));
return; // Returns immediately! Stream still processing!
}
Fix:
const { pipeline } = require('stream');
const { promisify } = require('util');
const pipelineAsync = promisify(pipeline);
async function processFile() {
await pipelineAsync(
fs.createReadStream('file.txt'),
someTransform,
fs.createWriteStream('output.txt')
);
// Now it actually waits for completion!
}
When to Use Streams vs. Buffers ๐ค
Use streams when:
- โ Processing large files (>1MB)
- โ Real-time data (video, audio)
- โ Network requests/responses
- โ Unknown data size
- โ Memory constrained environments
Use buffers (readFile) when:
- โ Small files (<1MB)
- โ Need entire file to process (like JSON parsing)
- โ Simplicity matters more than memory
- โ Data transformations that need full context
Real talk: If file size is predictable and small, readFile() is fine! Don't over-engineer. But for uploads, exports, or logs? Streams all the way!
Quick Wins (Do These Today!) ๐โโ๏ธ
- Find all
readFile()in upload handlers โ Replace with streams - Add stream error handlers โ Prevent crashes
- Use pipeline() โ Better error handling than .pipe()
- Export endpoints โ Stream responses for large datasets
- Monitor memory usage โ Catch memory leaks early
Your Stream Checklist โ
Before you deploy:
- Large file operations use streams (not readFile)
- All streams have error handlers
- Client disconnect cleanup (req.on('close'))
- Using pipeline() for multiple streams
- Respecting backpressure (checking write() return value)
- No unhandled stream errors
- Memory monitoring in production
The Bottom Line
Streams are Node.js's secret weapon for handling large data efficiently. They're the difference between "works on my machine" and "scales in production"!
The essentials:
- Use streams for large data (files, exports, uploads)
- Always handle errors (or your server will crash)
- Use pipeline() (better than .pipe() for error handling)
- Respect backpressure (wait for drain event)
- Clean up on disconnect (prevent memory leaks)
When I was building Node.js APIs at Acodez, streams saved our server from constant OOM crashes. Coming from Laravel where PHP handles memory per-request, Node.js taught me: In a single-threaded world, memory efficiency isn't optional - it's survival! ๐
Think of streams as sipping through a straw instead of chugging from a fire hose. Your server will thank you, your users will thank you, and your boss will stop asking why the server keeps crashing! ๐ง
Got stream horror stories? Share them on LinkedIn - OOM crashes make the best war stories!
Want to see my Node.js projects? Check out my GitHub - all properly streamed, I promise! ๐
P.S. - If you're using readFile() for user uploads, go fix that RIGHT NOW. Your server's life depends on it! ๐งโจ