0x55aa
Back to Blog

🌊 Node.js Streams: Stop Loading Your Entire Database Into RAM

|
6 min read

Picture this: your boss wants a report. Your app fetches 800,000 rows from the database, shoves them all into a JSON array, serializes the whole thing, and sends it to the client. Works great in staging with 200 rows. On prod? Your server becomes a very expensive paperweight.

This is the moment Node.js Streams walk in, crack their knuckles, and say, "I got you."

What Even Is a Stream?

A stream is like a water pipe — data flows through it chunk by chunk, instead of filling up a giant tank before anything moves. You don't wait for the entire river to arrive before you start drinking. You stick a cup under the tap.

Node.js has had streams since v0.9, and they power everything under the hood: HTTP requests, file reads, process.stdin. You're already using them. You're just probably not thinking in them.

There are four types:

  • Readable — data flows out (e.g., reading a file)
  • Writable — data flows in (e.g., writing a file)
  • Duplex — flows both ways (e.g., a TCP socket)
  • Transform — reads in, transforms, writes out (e.g., gzip compression)

The magic trick is .pipe(), which connects them together like garden hoses.

The Classic Problem: Big File Downloads

Let's say you have an endpoint that serves a large CSV export. The naive approach:

// ❌ The "why is the server on fire" approach
app.get('/export/users', async (req, res) => {
  const users = await db.query('SELECT * FROM users'); // loads ALL rows
  const csv = convertToCsv(users.rows);                // entire CSV in memory
  res.send(csv);                                        // then sends it
});

If you have 500,000 users and each row is ~500 bytes, you just allocated 250MB for a single request. Five concurrent requests? You've kissed a gigabyte goodbye. Your ops team is already composing a strongly-worded Slack message.

Now the streams version:

// ✅ The "ops team buys you coffee" approach
const { Transform } = require('stream');
const { stringify } = require('csv-stringify');

app.get('/export/users', (req, res) => {
  res.setHeader('Content-Type', 'text/csv');
  res.setHeader('Content-Disposition', 'attachment; filename="users.csv"');

  const dbStream = db.query(new Query('SELECT * FROM users')).stream();
  const csvTransform = stringify({ header: true });

  dbStream
    .pipe(csvTransform)  // transform rows → CSV chunks
    .pipe(res);          // stream directly to HTTP response

  dbStream.on('error', (err) => {
    console.error('Stream error:', err);
    res.destroy(err);
  });
});

Row comes out of the database → gets converted to CSV → immediately sent to the client. The max memory usage is roughly the size of one chunk (usually a few KB), not the entire dataset. Your server barely notices. Your RAM breathes a sigh of relief.

Streams as a Middleware Pipeline

Here's where it gets really fun. Streams compose beautifully. Need to gzip compress that CSV before sending? Add one pipe:

const zlib = require('zlib');

app.get('/export/users/compressed', (req, res) => {
  res.setHeader('Content-Type', 'text/csv');
  res.setHeader('Content-Encoding', 'gzip');
  res.setHeader('Content-Disposition', 'attachment; filename="users.csv.gz"');

  const dbStream = db.query(new Query('SELECT * FROM users')).stream();
  const csvTransform = stringify({ header: true });
  const gzip = zlib.createGzip();

  dbStream
    .pipe(csvTransform)  // rows → CSV
    .pipe(gzip)          // CSV → compressed CSV
    .pipe(res);          // compressed CSV → client

  // Handle backpressure automatically — if client is slow,
  // the whole pipeline pauses. Beautiful.
});

Three .pipe() calls. Database → CSV → compressed → response. The data flows through each stage without any stage needing to hold the whole thing in memory simultaneously. And critically, backpressure is handled automatically — if the client is downloading slowly, the pipeline slows down. Your database isn't hammering ahead producing data that has nowhere to go.

This is the thing people miss about streams: it's not just about memory. It's about flow control. The water doesn't overflow the bucket because the pipe adjusts to the bucket's fill rate.

When to Reach for Streams

Not everything needs a stream. If you're fetching a single user record, don't overcomplicate it. Streams shine when:

  • Large data exports — CSVs, JSONs, reports (anything over a few MB)
  • File uploads/downloads — piping directly to/from S3 or disk
  • Log processing — reading multi-GB log files line by line
  • Real-time data — pushing events to clients via Server-Sent Events
  • ETL pipelines — transforming one data format to another on the fly

A rough rule of thumb: if the data could plausibly exceed the available RAM on a single server, use a stream. If it won't, the simpler approach is fine.

The One Gotcha: Error Handling

Classic streams have a slightly annoying quirk — errors don't propagate through .pipe() by default. If your database stream errors out mid-export, the response might just hang.

The modern fix is stream.pipeline() from the stream module, which does proper error propagation and cleanup:

const { pipeline } = require('stream/promises');

app.get('/export/users', async (req, res) => {
  res.setHeader('Content-Type', 'text/csv');

  try {
    await pipeline(
      db.query(new Query('SELECT * FROM users')).stream(),
      stringify({ header: true }),
      res
    );
  } catch (err) {
    console.error('Pipeline failed:', err);
    if (!res.headersSent) res.status(500).send('Export failed');
  }
});

pipeline() ensures all streams are properly destroyed if anything goes wrong. No hanging connections, no memory leaks. Clean.

The Mental Shift

The biggest hurdle with streams isn't the API — it's the mindset. We're trained to think in complete units: get the thing, process the thing, return the thing. Streams ask you to think in flow: data arrives, pass it along, let it go.

Once it clicks, you'll start seeing streaming opportunities everywhere. That endpoint loading 10,000 products to find the out-of-stock ones? That's a stream with a filter transform. That log aggregator reading six files? That's a merge stream. That webhook that re-encodes uploaded images? That's a transform stream to an S3 write stream.

Your servers will be happier. Your users will get their downloads faster. And your ops team might — might — stop giving you side-eye in the standup.


Try it this week: Find one endpoint in your app that loads "a bunch of data" and sends it back. Benchmark its memory usage under load. Then refactor it to stream. The difference will make you feel like you discovered a cheat code.

Want to go deeper? Check out the official Node.js Streams documentation and the stream-json package for streaming JSON processing — it's a whole other level of fun.

What's the biggest dataset you've ever accidentally loaded into memory? Drop it in the comments — I guarantee someone has a more embarrassing story than you. 😄

Thanks for reading!

Back to all posts