🐳 Docker Multi-Stage Builds: Stop Shipping Your Entire Dev Environment to Production

Let me paint you a picture. It's 3 AM. Your on-call alert fires. The deployment is crawling. You SSH in, check the image size, and see it: 1.4 gigabytes. For a Node.js API that serves JSON.

You've shipped your entire development environment — compilers, test frameworks, source maps, node_modules with all its chaotic energy — straight into production. Somewhere, a penguin weeps.

This is the story of how Docker multi-stage builds became my personal DevOps religion, and why you should convert immediately.

The Problem: You're Baking the Whole Kitchen Into the Cake

Most developers write their first Dockerfile like this:

FROM node:20

WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

EXPOSE 3000
CMD ["node", "dist/server.js"]

Looks innocent, right? Wrong. That node:20 base image alone is ~1GB. Then you pile on npm install (hello, node_modules abyss), your entire source tree, build tools, and TypeScript compiler. You're basically shipping a fully-equipped workshop when all production needs is the finished shelf.

The result? Slow pulls, bloated registries, longer startup times, and a larger attack surface for security vulnerabilities. Every unnecessary binary in your image is a potential exploit waiting to happen.

Enter Multi-Stage Builds: The Marie Kondo of Dockerfiles

Multi-stage builds let you use multiple FROM statements in a single Dockerfile. Each stage can copy artifacts from previous stages — meaning you can build in one fat image and then copy only what you need into a lean final image.

Here's the same Node.js app, done right:

# Stage 1: The Builder (does the heavy lifting)
FROM node:20-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production=false
COPY . .
RUN npm run build

# Stage 2: The Runner (sleek, minimal, production-ready)
FROM node:20-alpine AS runner

WORKDIR /app

# Only copy what production actually needs
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package.json ./

ENV NODE_ENV=production
EXPOSE 3000

USER node
CMD ["node", "dist/server.js"]

The builder stage installs everything, compiles TypeScript, runs whatever build tooling you need. The runner stage? It's born knowing only what it needs to serve traffic. No TypeScript compiler. No dev dependencies. No source files. Just vibes and JSON.

Result: 1.2GB → ~120MB. That's not a typo.

Real-World Lesson: The Go Binary That Changed My Life

The most dramatic example I've witnessed was a Go microservice. Go compiles to a single static binary, which opens the door to something beautiful:

# Stage 1: Build the Go binary
FROM golang:1.22-alpine AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o server ./cmd/server

# Stage 2: Scratch image — literally nothing
FROM scratch AS runner

COPY --from=builder /app/server /server
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

EXPOSE 8080
ENTRYPOINT ["/server"]

FROM scratch means the final image contains exactly two files: the binary and SSL certificates. The image clocked in at 11MB. Our previous Go image was 850MB. The ops team thought I'd made a mistake. I had not.

Pull times dropped from 45 seconds to 3. Cold start in Kubernetes? Basically instant. Security scanners stopped crying because there was almost nothing to scan.

Tips That Took Me the Hard Way to Learn

Use --target to build specific stages locally. During development, you probably want the builder stage with all the tools:

docker build --target builder -t myapp:dev .

This is chef's kiss for debugging build issues without spinning up a full prod container.

Pin your base images. node:20-alpine is a moving target. Use digests in production:

FROM node:20-alpine@sha256:abc123...

Your future self debugging a broken build at 2 AM will thank you. Past self never listened to this advice. Past self suffered.

Layer cache is your best friend — or your worst enemy. Put things that change least at the top. COPY package.json before COPY . . means Docker reuses the dependency install layer as long as your package files haven't changed. This alone can save minutes off your CI pipeline.

Don't run as root. Notice the USER node line in the Node example? That one line prevents a whole category of container escape vulnerabilities. It costs you nothing. Add it.

The CI/CD Payoff

When you're running multi-stage builds in GitHub Actions or any CI system, the compound benefits stack up fast. Smaller images mean:

Faster pushes to your container registry
Faster pulls in your Kubernetes cluster (especially on scale-up events)
Cheaper storage costs (yes, ECR and GCR charge by the GB)
Security scans that complete in seconds instead of minutes
On-call alerts that don't wake you up because deployments time out

One team I worked with cut their deployment time from 8 minutes to 90 seconds just by fixing their Dockerfile. Same application. Same infrastructure. Just smarter packaging.

Your Turn

If you have a Dockerfile right now that doesn't use multi-stage builds, open it. I'll wait.

Now ask yourself: does production actually need your compiler? Your test runner? Your .git folder? (Yes, people do that. No, I will not explain further.)

The refactor usually takes 30 minutes. The benefits last as long as your service runs. Start with your biggest, angriest image — the one that makes your CI pipeline wheeze — and give it the multi-stage treatment.

Your containers should be athletes, not hoarders. Lean, mean, and purpose-built for one job.

Build smaller. Deploy faster. Sleep better. 🐧

What's the wildest image size reduction you've achieved? Drop it in the comments — I want to see the before and after.