🌊 Node.js Streams: Stop Loading the Whole File Into Memory

Here's a fun scenario: your boss says "add a CSV export." You write a quick endpoint, test it with 50 rows, it works great. You deploy. Three weeks later, a user downloads 2 million rows and your Node.js process eats 4GB of RAM, the pod crashes, and you're getting paged at 3am.

Sound familiar? The culprit: loading the entire file into memory before sending it.

The fix: Node.js Streams — one of the most powerful and most ignored features in the runtime.

What Even Is a Stream?

Think of it like a garden hose vs a water balloon. A water balloon holds all the water at once — great for summer fun, terrible for server memory. A garden hose delivers water continuously in a flow you can control.

Streams in Node.js work the same way. Instead of reading an entire 2GB file into a Buffer and then processing it, you process it chunk by chunk as it flows through your pipeline.

There are four types of streams in Node.js:

Readable — data flows out (like fs.createReadStream)
Writable — data flows in (like fs.createWriteStream)
Duplex — both directions (like a TCP socket)
Transform — reads, modifies, and writes (like a gzip compressor)

The "Before" Code (Don't Do This)

const express = require('express');
const fs = require('fs');
const app = express();

// ❌ The RAM Destroyer 3000
app.get('/download', (req, res) => {
  // Reads the ENTIRE file into memory before sending anything
  const data = fs.readFileSync('./huge-export.csv');
  res.setHeader('Content-Type', 'text/csv');
  res.send(data);
});

This works until it doesn't. With a 500MB file, you just allocated 500MB of RAM for one request. With 10 concurrent users doing the same thing? Congrats, you've invented an accidental DoS attack on yourself.

The "After" Code (Streams to the Rescue)

const express = require('express');
const fs = require('fs');
const { Transform } = require('stream');
const app = express();

// ✅ The Memory-Efficient Version
app.get('/download', (req, res) => {
  res.setHeader('Content-Type', 'text/csv');
  res.setHeader('Content-Disposition', 'attachment; filename="export.csv"');

  const fileStream = fs.createReadStream('./huge-export.csv');

  // Pipe the file stream directly into the response
  // Node handles backpressure automatically — it won't read
  // faster than the client can receive
  fileStream.pipe(res);

  fileStream.on('error', (err) => {
    console.error('Stream error:', err);
    res.status(500).end('Something went wrong');
  });
});

app.listen(3000, () => console.log('Streaming on port 3000'));

The magic word here is pipe. It wires a Readable stream into a Writable stream and handles backpressure automatically — meaning if the client is downloading slowly, Node won't keep reading ahead and buffering gigabytes of data. It waits. It's polite like that.

Real-World Power Move: Transform Streams

Here's where it gets fun. What if you need to process data while streaming it? Enter Transform streams — the middleware of the stream world.

const { Transform, pipeline } = require('stream');
const { promisify } = require('util');
const pipelineAsync = promisify(pipeline);

// A Transform stream that converts JSON lines to CSV on the fly
class JSONToCSVTransform extends Transform {
  constructor(options) {
    super({ ...options, objectMode: false });
    this._headerWritten = false;
  }

  _transform(chunk, encoding, callback) {
    try {
      const lines = chunk.toString().trim().split('\n');

      for (const line of lines) {
        if (!line) continue;
        const record = JSON.parse(line);

        // Write header row once
        if (!this._headerWritten) {
          this.push(Object.keys(record).join(',') + '\n');
          this._headerWritten = true;
        }

        // Write data row
        this.push(Object.values(record).join(',') + '\n');
      }

      callback();
    } catch (err) {
      callback(err);
    }
  }
}

// Use pipeline() instead of pipe() — it handles cleanup on errors
app.get('/export', async (req, res) => {
  res.setHeader('Content-Type', 'text/csv');
  res.setHeader('Content-Disposition', 'attachment; filename="data.csv"');

  try {
    await pipelineAsync(
      fs.createReadStream('./data.ndjson'),
      new JSONToCSVTransform(),
      res
    );
  } catch (err) {
    console.error('Pipeline failed:', err);
    // Response may be partially sent, so we can't always set status codes here
  }
});

The pipeline() utility (use it over raw pipe()) properly handles errors and cleans up all streams in the chain — no memory leaks, no hanging file descriptors.

The Backpressure Problem (And Why You Don't Have to Worry Much)

Backpressure is what happens when your Readable produces data faster than your Writable can consume it. Without handling it, you'd buffer everything in memory — defeating the whole point.

The good news: pipe() and pipeline() handle backpressure for you. When the destination says "slow down," the source pauses automatically. It's like cruise control for data flow.

If you're ever writing a custom stream and calling push() in a loop, check the return value — if it's false, stop pushing and wait for the 'drain' event before continuing. But for most Express use cases, pipeline() covers you.

Quick Wins: When to Reach for Streams

File downloads — always stream, never load entirely
File uploads — pipe req (a Readable!) directly to storage
Database cursor results — stream rows instead of SELECT * into memory
Real-time data — SSE (Server-Sent Events) are just Writable streams to res
Log processing — read multi-GB logs line by line with readline.createInterface

const readline = require('readline');

// Process a massive log file line by line
const rl = readline.createInterface({
  input: fs.createReadStream('./server.log'),
  crlfDelay: Infinity
});

let errorCount = 0;
rl.on('line', (line) => {
  if (line.includes('ERROR')) errorCount++;
});

rl.on('close', () => {
  console.log(`Found ${errorCount} errors`);
});

Processes a 10GB log file with barely any memory usage. Feels like cheating.

The Takeaway

Streams are one of those Node.js features that feel intimidating until you actually use them — then they feel like superpowers. The core mental model is simple: data flows through a pipeline, chunk by chunk, without ever needing to fit entirely in memory.

Your first step: go find the place in your codebase where you're doing fs.readFileSync or JSON.parse on a file of unknown size. Replace it with a stream. Your RAM budget will thank you, your ops team will thank you, and future-you getting paged at 3am will definitely thank you.

Got a gnarly stream use case you've solved? Or a backpressure horror story? Drop it in the comments — war stories make the best learning material. 🚀

🌊 Node.js Streams: Stop Loading the Whole File Into Memory (Your RAM Will Thank You)