0x55aa
Back to Blog

🚀 GitHub Actions: Stop Waiting 20 Minutes for Your CI to Fail

|
5 min read

Let me paint you a picture. It's 4:45 PM on a Friday. You've got a one-line fix — a typo in a config file. You push it, open a PR, and then... you wait. The CI pipeline chugs away for 22 minutes. It installs 800 npm packages from scratch. It rebuilds your Docker image layer by layer. It runs the same linter that ran five minutes ago. And then — at minute 23 — it fails because of a flaky test that has nothing to do with your change.

We've all been there. GitHub Actions is genuinely powerful, but out-of-the-box pipelines are often slow, wasteful, and unnecessarily painful. Let's fix that.

The Cache Is Your Best Friend (And You're Ignoring It)

The single biggest speed win in any CI pipeline is caching. Every time your workflow installs dependencies from scratch, it's wasting time you'll never get back. GitHub Actions has built-in cache support, and it's shockingly underused.

Here's a real-world Node.js workflow with proper caching:

name: CI

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'   # <-- This one line. That's it.

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

      - name: Build
        run: npm run build

That cache: 'npm' line in setup-node will cache your node_modules based on your package-lock.json. On a cache hit, dependency installation goes from 3 minutes to 8 seconds. That's not a typo. Eight. Seconds.

The same pattern works for Python (cache: 'pip'), Go, Ruby, and more. If you're not caching your package manager, you're just donating CPU time to GitHub for no reason.

Parallelism: Stop Running Things One at a Time

Most developers set up their CI like a to-do list — lint, then test, then build, then deploy. Sequential. Polite. Slow.

The thing is, your linter doesn't care whether your tests passed first. Your type checker doesn't need to wait for your integration tests. Run them in parallel using a job matrix:

jobs:
  checks:
    strategy:
      matrix:
        task: [lint, typecheck, unit-tests, integration-tests]
      fail-fast: false   # Don't cancel other jobs when one fails

    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm run ${{ matrix.task }}

Four jobs, running simultaneously, sharing the same cache. Your total CI time becomes the duration of your slowest job, not the sum of all of them. A pipeline that took 18 minutes suddenly takes 6.

fail-fast: false is the unsung hero here. Without it, if your lint job fails in 30 seconds, GitHub cancels the integration tests that are halfway through. You then fix the lint error, push again, and wait another 6 minutes just to find out your integration tests also fail. With fail-fast: false, you get all your failures in one shot.

The Flaky Test Problem (And What to Actually Do About It)

Here's a truth nobody wants to hear: if your test suite has flaky tests, no amount of CI optimization will save you. A test that randomly fails 10% of the time will slow you down more than slow dependency installs ever will.

Flaky tests are usually one of three things:

  1. Race conditions — async code that sometimes resolves in the wrong order
  2. External dependencies — tests that hit real APIs, databases, or time-based logic
  3. Test pollution — tests that modify global state and affect each other

The fix for #2 is mocking or using a proper test database. But before you do that, add a retry mechanism to separate the "my code is broken" failures from the "the network hiccupped" failures:

- name: Run integration tests
  uses: nick-fields/retry@v3
  with:
    timeout_minutes: 10
    max_attempts: 3
    command: npm run test:integration

This retries up to three times before calling it a real failure. It's not a permanent fix — you still need to go find and squash those flaky tests — but it stops them from blowing up your Friday deployments while you deal with more important things.

Real-World Lessons Learned the Hard Way

Lesson 1: Pin your action versions. Using actions/checkout@main means someone can push a change to that action that breaks your entire pipeline. Use actions/checkout@v4 or even pin to a specific commit SHA. Your future self will thank you.

Lesson 2: Secrets are not environment variables. Don't log secrets. Don't echo them. Don't print your entire environment with env. GitHub will mask known secrets in logs, but you can still accidentally expose them in creative ways. Treat secrets like passwords, because they are.

Lesson 3: Artifacts have a cost. Storing build artifacts with actions/upload-artifact is great for debugging, but each artifact eats into your GitHub storage quota. Set retention-days to something sensible (7–30 days) or you'll be drowning in old artifacts from builds nobody remembers.

Lesson 4: Workflow files are code. They deserve code review. A broken workflow that deploys bad code to production is just a bug written in YAML instead of JavaScript. Review .github/workflows/ changes with the same scrutiny you'd apply to application code.

Your Next Step

Pick one thing from this post and implement it today. Just one. Add caching to your Node or Python workflow. Split your lint and test jobs to run in parallel. Add fail-fast: false to your matrix. Small wins compound.

A CI pipeline that developers trust and enjoy using gets used well — people add tests, they fix flaky specs, they keep the build green. A pipeline that takes 25 minutes and fails randomly gets ignored, worked around, and eventually abandoned.

Your CI pipeline is your team's immune system. Keep it fast, keep it honest, and it'll catch problems before they become incidents. Let it rot, and one day it'll be the reason you're debugging a production outage at midnight.

Now go fix that cache configuration. Your Fridays depend on it. 🎉

Thanks for reading!

Back to all posts