0x55aa
← Back to Blog

🔥 Kubernetes Resource Limits: Stop Letting Your Pods Eat All the RAM

5 min read

🔥 Kubernetes Resource Limits: Stop Letting Your Pods Eat All the RAM

Picture this: it's 2 AM, your on-call phone screams to life, and your entire production cluster is on its knees. Pods are being OOMKilled left and right, nodes are unschedulable, and your SRE is typing in all caps in Slack. The culprit? A single Node.js service that decided tonight was the night to hoard 14GB of RAM like a digital doomsday prepper.

Welcome to the consequences of not setting Kubernetes resource limits. Take a seat — we need to talk.


Why Resources Matter (Like, Actually Matter)

Kubernetes schedules pods onto nodes based on what resources they say they need. No resource requests? The scheduler shrugs, tosses your pod anywhere, and hopes for the best. No limits? Your pod will gleefully consume everything in sight until the kernel steps in and murders it with SIGKILL.

It's the tragedy of the commons, but for CPU cycles and megabytes.

There are two values you care about:

  • Requests: "Hey Kubernetes, I'll probably need at least this much." Used for scheduling decisions.
  • Limits: "This is the absolute maximum I'm allowed to consume." Cross this and bad things happen.

Think of requests as your restaurant reservation and limits as the fire code occupancy limit. One gets you in the door, the other keeps the building standing.


The Sacred Configuration

Here's the most important YAML you'll write today. Don't skip this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-api
  template:
    metadata:
      labels:
        app: my-api
    spec:
      containers:
        - name: my-api
          image: my-api:latest
          resources:
            requests:
              memory: "128Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"

A few things to unpack here:

  • 250m CPU means 250 millicores, or 25% of a single core. 1000m = 1 full CPU core.
  • Memory is in bytes. 128Mi is 128 mebibytes. Don't use 128M — that's megabytes, and Kubernetes will smile and nod but you'll confuse yourself later.
  • Your limit should be a realistic ceiling, not a wild guess. Set it too low and you'll cause OOMKills. Set it too high and you're back to the 2 AM phone call.

The QoS Tiers Nobody Tells You About

Here's the plot twist: Kubernetes assigns each pod a Quality of Service (QoS) class based on how you set your resources. This determines who gets evicted first when the node is under pressure.

Class Condition Eviction Priority
Guaranteed requests == limits for all containers Last to go
Burstable requests < limits (or only some set) Middle of the pack
BestEffort No requests or limits set at all First to die

That production service you deployed with zero resource config? It's BestEffort. It will be the first thing Kubernetes throws overboard when the ship starts sinking. Set equal requests and limits to get Guaranteed QoS for your critical workloads. Your 2 AM self will thank you.


Right-Sizing: The Art of Not Guessing

Okay, so you need to set resources — but how much? Here's the honest answer: profile first, then configure.

# Check actual resource usage for a running pod
kubectl top pod my-api-7d4f9c-xkj2p

# Check node-level pressure
kubectl top nodes

# Describe a pod to see resource assignments + recent events
kubectl describe pod my-api-7d4f9c-xkj2p | grep -A5 "Requests\|Limits\|OOMKilled"

Run your service under realistic load, observe actual consumption via kubectl top or your metrics stack (Prometheus + Grafana is the classic combo), then set requests at roughly the p50 usage and limits at the p99 usage with a little headroom. Don't set limits at 10x your requests "just to be safe" — that's how you end up right back at the buffet problem.


Real-World Lessons Learned

Lesson 1: The OOMKill spiral. A pod hits its memory limit, gets killed, restarts, immediately starts doing whatever caused the memory spike, gets killed again. CrashLoopBackOff enters the chat. Either raise the limit, fix the memory leak, or both.

Lesson 2: CPU throttling is silent suffering. Unlike memory, hitting a CPU limit doesn't kill the pod — it just throttles it. Your service slows to a crawl and you spend three hours debugging "latency issues" before noticing the pod is CPU-throttled 70% of the time. Always monitor container_cpu_cfs_throttled_seconds_total in Prometheus.

Lesson 3: Namespace LimitRanges are your friend. Set a cluster-wide or namespace-wide default so that developers who forget to set resources still get something reasonable:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
    - type: Container
      default:
        memory: "256Mi"
        cpu: "200m"
      defaultRequest:
        memory: "128Mi"
        cpu: "100m"

Now every container gets a sensible baseline, and nobody accidentally deploys a BestEffort pod into production.


The Takeaway

Resource requests and limits are not optional Kubernetes trivia. They are the difference between a cluster that scales gracefully and a cluster that eats itself at the worst possible moment.

Set your requests based on what your service actually uses. Set your limits to protect the neighborhood. Use LimitRange to save your team from themselves. And for the love of all things uptime, please check kubectl top before deploying anything to production.

Your on-call rotation will sleep better. So will you.


Got a war story about OOMKilled pods or a cluster meltdown? Drop it in the comments — misery loves company, and we all learn from production disasters.