0x55aa
← Back to Blog

Docker Multi-Stage Builds: Stop Shipping Your Entire Dev Environment to Production šŸ³āœ‚ļø

•15 min read

Docker Multi-Stage Builds: Stop Shipping Your Entire Dev Environment to Production šŸ³āœ‚ļø

Real confession: The first Docker image I pushed to production was 1.8GB. It included node_modules, build tools, test files, source maps, and probably my SSH keys (yikes!). Deploy time? 12 minutes. My manager watched the progress bar and asked if we were downloading the entire internet. šŸ˜…

Manager: "Why is the Docker image bigger than the actual application?"

Me: "Uh... I needed all those dependencies to build it?"

Him: "Do you need them to RUN it?"

Me: 🤯

Welcome to multi-stage builds - the Docker feature that will make your images 90% smaller and your deploys 10x faster!

What's a Multi-Stage Build Anyway? šŸ¤”

Think of it like packing for a trip:

Single-stage Dockerfile (Amateur hour):

# Pack EVERYTHING
FROM node:18
WORKDIR /app
COPY . .
RUN npm install        # Dev dependencies
RUN npm run test       # Test framework
RUN npm run build      # Build tools
CMD ["npm", "start"]

# Result: 1.2GB image with:
# āœ… Your app
# āŒ TypeScript compiler (not needed!)
# āŒ Jest test framework (not needed!)
# āŒ Webpack and 500 build plugins (not needed!)
# āŒ Source .ts files (you shipped the .js!)
# āŒ node_modules with 400MB of dev deps

Multi-stage Dockerfile (Pro move):

# Stage 1: Build (use all the tools!)
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install        # ALL dependencies
COPY . .
RUN npm run build      # Build the app

# Stage 2: Production (ship ONLY what's needed!)
FROM node:18-alpine    # Smaller base image
WORKDIR /app
COPY package*.json ./
RUN npm install --production  # ONLY production deps
COPY --from=builder /app/dist ./dist  # Just the built files!
CMD ["node", "dist/index.js"]

# Result: 180MB image with:
# āœ… Your app (built)
# āœ… Production dependencies
# āŒ Everything else LEFT BEHIND!

Translation: Build with all the tools, but ship ONLY the essentials! Your production image doesn't need TypeScript - it's already compiled to JavaScript! šŸŽÆ

The Deployment Horror Story That Taught Me This šŸ’€

After deploying countless Node.js and Laravel apps to AWS, I learned about image size the expensive way:

Black Friday 2020, 3 AM (Server crashes, need to deploy fix FAST!):

# My deploy process
docker build -t myapp:hotfix .
# Building... 8 minutes ā°
docker push myapp:hotfix
# Pushing 1.8GB... 7 minutes ā°
kubectl rollout restart deployment/myapp
# Pulling image on 3 nodes... 5 minutes each ā°
# Total downtime: 20+ MINUTES! šŸ’€

What happened:

  • Server crashed during peak traffic (worst timing!)
  • Had a fix ready in 2 minutes
  • Spent 20 minutes deploying it (image too big!)
  • Lost $8,000 in sales while users refreshed frantically
  • My stress level: šŸ“ˆšŸ“ˆšŸ“ˆ

After switching to multi-stage builds:

docker build -t myapp:hotfix .
# Building... 3 minutes ā°
docker push myapp:hotfix
# Pushing 180MB... 45 seconds ā°
kubectl rollout restart deployment/myapp
# Pulling image... 30 seconds per node ā°
# Total downtime: 4 MINUTES! āœ…

Savings: 16 minutes of downtime = thousands in revenue saved! šŸ’°

Multi-Stage Builds 101: Your First Slim Image šŸŽ“

Example #1: Node.js App

Before (The bloated mess):

FROM node:18
WORKDIR /app

# Copy everything
COPY . .

# Install everything
RUN npm install

# Build
RUN npm run build

# Run
CMD ["npm", "start"]

# Image size: 1.2GB
# Build time: 5 minutes
# Deploy time: 8 minutes

After (The lean machine):

# Stage 1: Build stage
FROM node:18 AS builder

WORKDIR /app

# Copy package files first (better caching!)
COPY package*.json ./

# Install ALL dependencies (including devDependencies)
RUN npm ci

# Copy source code
COPY . .

# Build the application
RUN npm run build
# Now we have dist/ with compiled JavaScript

# Stage 2: Production stage
FROM node:18-alpine AS production

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install ONLY production dependencies
RUN npm ci --only=production

# Copy built files from builder stage
COPY --from=builder /app/dist ./dist

# Set production environment
ENV NODE_ENV=production

# Run the app
CMD ["node", "dist/index.js"]

# Image size: 180MB (85% smaller!)
# Build time: 3 minutes
# Deploy time: 2 minutes

What changed:

  • āœ… Builder stage has all dev dependencies (TypeScript, webpack, etc.)
  • āœ… Production stage has ONLY runtime dependencies
  • āœ… Source .ts files left in builder stage
  • āœ… node_modules shrunk from 400MB → 80MB
  • āœ… Used alpine (smaller base image)

Result: Same app, 85% smaller image! šŸŽ‰

Example #2: Python/Django App

Before:

FROM python:3.11
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["gunicorn", "app:app"]

# Image size: 950MB (includes pip, setuptools, build tools)

After:

# Build stage
FROM python:3.11 AS builder

WORKDIR /app

# Install dependencies in a virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.11-slim AS production

WORKDIR /app

# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy application code
COPY . .

# Run as non-root user (security bonus!)
RUN useradd -m appuser && chown -R appuser /app
USER appuser

CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]

# Image size: 320MB (66% smaller!)

Why python:3.11-slim?

  • Regular python:3.11: 950MB (includes build tools, compilers)
  • python:3.11-slim: 130MB (just Python runtime)
  • python:3.11-alpine: 50MB (even smaller, but can break some packages)

A deployment lesson I learned: Alpine is great for Node.js, but for Python? Use slim! Alpine uses musl instead of glibc, which breaks many Python packages! šŸ

Example #3: Go App (The Dream!)

Go is PERFECT for multi-stage builds:

# Build stage
FROM golang:1.21 AS builder

WORKDIR /app

# Copy go.mod and go.sum first (better caching!)
COPY go.mod go.sum ./
RUN go mod download

# Copy source code
COPY . .

# Build statically linked binary
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

# Production stage - FROM SCRATCH!
FROM scratch

# Copy SSL certificates (needed for HTTPS)
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Copy the binary
COPY --from=builder /app/app /app

# Run it
ENTRYPOINT ["/app"]

# Image size: 8MB! 🤯
# Yes, EIGHT MEGABYTES!

Why this is insane:

  • Builder stage: 1.1GB (Go compiler + tools)
  • Production stage: 8MB (JUST the binary!)
  • FROM scratch = literally empty image
  • Go compiles to self-contained binary (no runtime needed!)

Deploy time: 5 seconds! You can push 8MB in your sleep! 😓

In production with Go services, I've seen images under 10MB that handle millions of requests! Docker taught me to love Go for this reason! šŸš€

Advanced Patterns (The Good Stuff) šŸŽÆ

Pattern #1: Layer Caching Optimization

Bad order (rebuilds everything on code change):

FROM node:18 AS builder
WORKDIR /app
COPY . .              # Copies EVERYTHING (code + package.json)
RUN npm install       # Reinstalls deps on ANY code change! 😱
RUN npm run build

Good order (only rebuilds deps when package.json changes):

FROM node:18 AS builder
WORKDIR /app

# Step 1: Copy ONLY package files
COPY package*.json ./

# Step 2: Install deps (cached unless package.json changes!)
RUN npm ci

# Step 3: Copy code (changes often, but deps already cached!)
COPY . .

# Step 4: Build
RUN npm run build

Impact:

  • Code change: Build takes 1 minute (deps cached!)
  • Dependency change: Build takes 4 minutes (expected)

After countless deployments, I learned: Order matters! Put the stable stuff first! šŸ“¦

Pattern #2: Named Stages for Flexibility

# Development stage
FROM node:18 AS development
WORKDIR /app
COPY package*.json ./
RUN npm install  # ALL dependencies (dev + prod)
COPY . .
CMD ["npm", "run", "dev"]

# Test stage
FROM development AS test
RUN npm run lint
RUN npm run test

# Build stage
FROM development AS builder
RUN npm run build

# Production stage
FROM node:18-alpine AS production
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY --from=builder /app/dist ./dist
CMD ["node", "dist/index.js"]

Now you can build different stages:

# Development
docker build --target development -t myapp:dev .

# Run tests
docker build --target test -t myapp:test .

# Production
docker build --target production -t myapp:prod .

Why I love this: Same Dockerfile for dev, test, AND production! No more "works on my machine" excuses! šŸŽÆ

Pattern #3: Security Scanning in Build Stage

# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Security scan stage
FROM builder AS security
RUN npm audit --audit-level=high
# Build fails if high/critical vulnerabilities found!

# Production stage
FROM node:18-alpine AS production
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY --from=builder /app/dist ./dist

# Run as non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001
USER nodejs

CMD ["node", "dist/index.js"]

Security wins:

  • āœ… Fails build if vulnerabilities found
  • āœ… Runs as non-root user (can't modify system)
  • āœ… Minimal attack surface (no build tools in prod)

A CI/CD pattern that saved our team: Catch vulnerabilities at build time, not in production! šŸ›”ļø

Pattern #4: Multi-Architecture Builds

# Build stage
FROM --platform=$BUILDPLATFORM node:18 AS builder
ARG TARGETPLATFORM
ARG BUILDPLATFORM

WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY --from=builder /app/dist ./dist
CMD ["node", "dist/index.js"]

Build for multiple architectures:

docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t myapp:latest \
  --push .

# Now works on:
# āœ… x86 servers (AWS EC2, most cloud)
# āœ… ARM servers (AWS Graviton, Apple Silicon)
# āœ… Raspberry Pi (why not? šŸ˜„)

When deploying on AWS Graviton instances, multi-arch builds saved me from maintaining separate images! One image, all platforms! šŸŒ

Real-World Production Dockerfile (What I Actually Use) šŸ­

My battle-tested Node.js + TypeScript template:

# Stage 1: Dependencies
FROM node:18-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && \
    npm cache clean --force

# Stage 2: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build && \
    npm run test

# Stage 3: Production
FROM node:18-alpine AS production

# Install dumb-init (proper PID 1 handling)
RUN apk add --no-cache dumb-init

# Create app directory and user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app

# Copy dependencies
COPY --from=deps --chown=nodejs:nodejs /app/node_modules ./node_modules

# Copy built application
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/package*.json ./

# Switch to non-root user
USER nodejs

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]

# Run the application
CMD ["node", "dist/index.js"]

# Labels for metadata
LABEL org.opencontainers.image.source="https://github.com/myuser/myapp"
LABEL org.opencontainers.image.description="My production Node.js app"
LABEL org.opencontainers.image.licenses="MIT"

Why this is production-ready:

  • āœ… Multi-stage (small image)
  • āœ… Non-root user (security)
  • āœ… dumb-init (proper signal handling)
  • āœ… Health check (Kubernetes knows when it's ready)
  • āœ… Tests run during build (catch issues early)
  • āœ… Production dependencies only
  • āœ… Proper file ownership
  • āœ… Metadata labels

Image size: 190MB (was 1.2GB before!)

Common Mistakes (Learn from My Pain!) 🪤

Mistake #1: Using COPY . . Too Early

Bad:

FROM node:18 AS builder
COPY . .              # Code changes = rebuild deps!
RUN npm install
RUN npm run build

Good:

FROM node:18 AS builder
COPY package*.json ./  # Copy deps definition first
RUN npm ci             # Cached unless package.json changes!
COPY . .               # Then copy code
RUN npm run build

Time saved: 80% of builds!

Mistake #2: Not Using .dockerignore

What happens without .dockerignore:

COPY . .
# Copies EVERYTHING:
# āœ… src/
# āŒ node_modules/ (500MB!)
# āŒ .git/ (100MB of history!)
# āŒ dist/ (old builds!)
# āŒ coverage/ (test reports!)
# āŒ .env (SECRETS! 😱)

Create .dockerignore:

# .dockerignore
node_modules
npm-debug.log
dist
coverage
.git
.env
.env.local
*.md
.vscode
.idea

Now COPY . . is safe and FAST! ⚔

Mistake #3: Running as Root

Bad (security nightmare):

FROM node:18-alpine
COPY . .
CMD ["node", "app.js"]
# Runs as root! 😱
# If app gets hacked, attacker has root access!

Good (defense in depth):

FROM node:18-alpine
RUN adduser -D appuser
WORKDIR /app
COPY --chown=appuser:appuser . .
USER appuser
CMD ["node", "app.js"]
# Runs as non-root user āœ…
# Attacker can't modify system even if app compromised!

Docker security lesson I learned: Never run production containers as root! EVER! šŸ›”ļø

Mistake #4: Not Testing the Production Image

Bad workflow:

# Build production image
docker build -t myapp:prod .

# Push to registry
docker push myapp:prod

# Deploy
kubectl apply -f deployment.yaml

# šŸ’„ DOESN'T START! Missing dependency!

Good workflow:

# Build production image
docker build -t myapp:prod .

# TEST IT LOCALLY FIRST!
docker run -p 3000:3000 myapp:prod

# Make a request
curl http://localhost:3000/health

# If it works āœ… THEN push!
docker push myapp:prod

A CI/CD pattern that saved us countless production incidents: Test the actual production image locally before deploying! šŸŽÆ

The Size Comparison (Real Numbers!) šŸ“Š

My e-commerce API (Node.js + TypeScript):

Approach Image Size Build Time Deploy Time Push Time
Single-stage (naive) 1.8GB 8 min 12 min 9 min
Multi-stage 210MB 4 min 3 min 1 min
Multi-stage + Alpine 180MB 4 min 2 min 45 sec
Multi-stage + Alpine + Optimizations 165MB 3 min 2 min 40 sec

Savings:

  • 91% smaller image (1.8GB → 165MB)
  • 83% faster deploys (12 min → 2 min)

In production with 3 instances:

  • Pull time: 27 min → 3 min (9x faster rollouts!)
  • Disk usage: 5.4GB → 495MB (10x less storage!)
  • Registry bandwidth: Significantly cheaper!

After setting up CI/CD for dozens of projects, I learned: Multi-stage builds are NOT optional - they're essential! šŸš€

The Docker Image Diet: Quick Wins šŸ„—

1. Use Alpine/Slim base images:

# Fat: 900MB
FROM node:18

# Slim: 250MB
FROM node:18-slim

# Alpine: 170MB
FROM node:18-alpine

2. Clean up package manager caches:

RUN apt-get update && \
    apt-get install -y --no-install-recommends build-essential && \
    rm -rf /var/lib/apt/lists/*
# ↑ Deletes apt cache! Saves 100MB+

3. Use npm ci instead of npm install:

# Slower, can have different versions
RUN npm install

# Faster, reproducible, cleaner
RUN npm ci --only=production && \
    npm cache clean --force

4. Minimize layers:

# Bad: 3 layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

# Good: 1 layer
RUN apt-get update && \
    apt-get install -y curl && \
    apt-get clean

The Bottom Line šŸ’”

Multi-stage builds aren't just about smaller images - they're about:

  • āœ… Faster deployments (less data to push/pull)
  • āœ… Lower costs (less registry storage, bandwidth)
  • āœ… Better security (no build tools in production)
  • āœ… Cleaner separation (build vs. runtime)

The truth about Docker in production:

It's not "can you ship a working image?" - it's "can you ship an optimized, secure, minimal image that deploys in under 2 minutes?"

After 7 years deploying containerized applications to AWS, I learned this: Image size matters! Every extra MB:

  • Slows down deployments
  • Costs money in bandwidth
  • Increases attack surface
  • Wastes disk space

Multi-stage builds solve ALL of these! šŸŽÆ

You don't need a 2GB image to run a 5MB application! Stop shipping your build tools to production! šŸ›‘

Your Action Plan šŸš€

Right now:

  1. Find your biggest Docker image
  2. Check its size: docker images | grep myapp
  3. Create .dockerignore
  4. Convert to multi-stage build

This week:

  1. Benchmark image sizes before/after
  2. Measure deploy time improvements
  3. Switch all projects to multi-stage
  4. Celebrate your 80% smaller images! šŸŽ‰

This month:

  1. Add security scanning to build pipeline
  2. Implement non-root users
  3. Set up multi-architecture builds
  4. Document your Dockerfile patterns
  5. Teach your team!

Resources Worth Your Time šŸ“š

Official docs:

Tools I use:

  • dive - Explore Docker image layers visually
  • hadolint - Dockerfile linter
  • docker-slim - Automatically optimize images

Reading:

Real talk: The best Dockerfile is the one you can understand 6 months later! Keep it simple, keep it documented! šŸ“


Still shipping gigabyte-sized images? Connect with me on LinkedIn and let's talk Docker optimization!

Want to see my production Dockerfiles? Check out my GitHub - real multi-stage builds from real projects!

Now go forth and slim those images! šŸ³āœ‚ļøāœØ


P.S. If your Docker image is bigger than your actual application code, you're doing it wrong! Multi-stage builds should be your default, not an optimization! šŸŽÆ

P.P.S. I once deployed a 3.2GB image to production and wondered why rollouts took 20 minutes. After switching to multi-stage builds (280MB), deploys took 90 seconds. Learn from my mistakes - optimize early! šŸš€