Kubernetes Resource Limits: Stop Your Pods from Eating the Whole Buffet 🍽️💥
Kubernetes Resource Limits: Stop Your Pods from Eating the Whole Buffet 🍽️💥
Real confession: We once deployed a new analytics service to our Kubernetes cluster on a Friday afternoon (already a crime). By Monday morning, every other service on the cluster was dead. The analytics pod had consumed 14 of the 16 available CPU cores, leaving crumbs for the payment service, the auth service, and basically everything else keeping the business alive.
The on-call engineer's Slack message said it all: "Did we... accidentally deploy a cryptocurrency miner?"
We had not. We had just forgotten to set resource limits.
Welcome to the most boring lesson with the most dramatic consequences in all of Kubernetes. 🎭
The Problem: Kubernetes Is a Shared Buffet 🍜
Here's the thing about Kubernetes that nobody warns you about when you're excited about "container orchestration" and "cloud-native deployments":
Every pod on a node shares the same underlying CPU and memory. Without limits, any single pod can gorge itself until other pods starve to death. Kubernetes won't intervene. It'll just watch. The scheduler is not your babysitter.
The OOMKiller (Out Of Memory Killer) is the cold, unfeeling process that eventually shows up to deal with the chaos. It picks the worst-offending process and kills it. Sometimes that's your bloated pod. Sometimes it's your payment service that just happened to be nearby. The OOMKiller does not care about your uptime SLA.
Requests vs Limits: The Core Concept 🧠
Kubernetes has two knobs for resource control, and confusing them is a rite of passage:
Requests = "I promise I'll need at least this much." The scheduler uses this to decide which node can fit your pod.
Limits = "Don't give me more than this, ever." The runtime enforces this hard ceiling.
# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-api
spec:
replicas: 3
selector:
matchLabels:
app: my-api
template:
metadata:
labels:
app: my-api
spec:
containers:
- name: api
image: my-api:latest
resources:
requests:
memory: "128Mi" # Guaranteed minimum
cpu: "250m" # 0.25 CPU cores, guaranteed
limits:
memory: "512Mi" # Hard ceiling - exceed this = OOM kill
cpu: "500m" # Hard ceiling - exceed this = throttled
The golden rule: Always set both. Always. No exceptions. Pin this to your monitor. Tattoo it on your forearm. Do whatever it takes.
What Happens When You Skip Them (Horror Stories) 💀
Scenario A: Memory Limit Missing
Your Node.js app has a memory leak. Normally this would be caught and the pod would restart. Without a memory limit, it just keeps growing. And growing. And growing. Until the node's memory is exhausted, the Linux kernel panics, and the OOMKiller starts executing pods at random like a malfunctioning vending machine.
Scenario B: CPU Limit Missing
Your machine learning inference service suddenly gets a traffic spike. It grabs 8 CPU cores to handle the load. Your auth service, your API gateway, your database proxy — they're all on the same node, now running at 2% of their normal CPU capacity. Login stops working. Users can't authenticate. Everything looks fine in your dashboards because the metrics service also got throttled and stopped reporting.
This is called a "noisy neighbor" problem, and it will ruin your weekends.
CPU vs Memory: They're Different Beasts 🐻🐯
Here's a subtle but critical distinction:
CPU limits = your pod gets throttled (slowed down) when it hits the ceiling. It keeps running, just slower.
Memory limits = your pod gets killed (OOMKilled) when it hits the ceiling. It dies instantly.
# Check if your pods are being throttled or killed
kubectl get pods --all-namespaces
# Look for STATUS: OOMKilled - that's a memory limit hit
kubectl describe pod my-api-xyz-abc
# Look for "OOMKilled" in the last state
# Or "Reason: OOMKilled" in Events
# Check CPU throttling (requires metrics-server)
kubectl top pods
# High CPU with weird latency = probably being throttled
This means setting your memory limit too low is far more dangerous than setting your CPU limit too low. An over-throttled pod is slow. An over-limited pod is dead.
Lesson learned the hard way: Set memory limits conservatively at first, then tune up. Set CPU limits generously, then tune down. A throttled app is embarrassing. A dead app is a 3 AM incident.
The Right Way to Set Resource Values 🎯
Don't just guess. Here's a systematic approach:
# Step 1: Run your app locally under realistic load
# Step 2: Check actual resource usage
kubectl top pods -n your-namespace
# Step 3: Look at historical metrics in Grafana/Datadog
# Step 4: Set requests at ~average usage
# Step 5: Set limits at ~2-3x peak usage (for memory)
# Step 6: Leave headroom, then monitor and adjust
For a typical Node.js API I've deployed:
# After profiling under load: avg 80Mi memory, peaks at 200Mi
# CPU: avg 100m, peaks at 350m
resources:
requests:
memory: "128Mi" # Slightly above average
cpu: "100m" # Matches average
limits:
memory: "384Mi" # ~2x peak (room for spikes)
cpu: "500m" # ~1.5x peak (throttling is acceptable)
The ratio that's saved me most often: requests ~= average usage, limits ~= 2-3x peak usage. If your peak is wildly unpredictable, that's a different problem (hello, Horizontal Pod Autoscaler).
LimitRange: Enforcing Sanity Cluster-Wide 🚔
The best way to prevent your team from deploying limitless pods is to make limits mandatory at the namespace level. Enter LimitRange:
# kubernetes/limit-range.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default: # Applied if no limit specified
memory: "256Mi"
cpu: "200m"
defaultRequest: # Applied if no request specified
memory: "128Mi"
cpu: "100m"
max: # Nobody can exceed this
memory: "2Gi"
cpu: "2000m"
min: # Nobody can go below this
memory: "64Mi"
cpu: "50m"
kubectl apply -f limit-range.yaml
# Now if someone deploys without resource limits, they get defaults
# And if they try to request 16GB of RAM, Kubernetes says no
Think of LimitRange as the "you must be this tall to ride" sign for your cluster. It's boring to configure and priceless when a new engineer deploys their first production pod at midnight.
Real-World Lessons That Cost Real Money 💸
Lesson 1: QoS Classes matter for eviction order
Kubernetes assigns Quality of Service classes based on how you set requests/limits:
Guaranteed: requests == limits (evicted last)Burstable: requests < limits (evicted second)BestEffort: no requests or limits set (evicted FIRST)
Your payment service should be Guaranteed. Your background analytics job can be BestEffort. Set them accordingly, or Kubernetes will evict your critical pods to save your log-rotation cron job. True story.
Lesson 2: The vertical vs horizontal scaling trap
When a pod keeps getting OOMKilled, the instinct is to raise the memory limit. Sometimes that's right. But often it's a memory leak that needs fixing, not more RAM. Throwing limits at a leaky app is like buying a bigger bucket for a sinking boat.
Lesson 3: Namespace-level resource quotas complement limits
# kubernetes/resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: production
spec:
hard:
requests.cpu: "4"
requests.memory: "8Gi"
limits.cpu: "8"
limits.memory: "16Gi"
pods: "20"
This caps the entire team's namespace. Great for multi-tenant clusters where one team's runaway deployment shouldn't tank everyone else's SLA.
Your Action Plan 🚀
Today:
- Run
kubectl get pods -o json | jq '.items[].spec.containers[].resources'— see which pods have no limits set - Be horrified by the results
- Add a
LimitRangeto every namespace
This week:
- Profile your top 5 services under realistic load
- Set proper requests and limits based on actual data
- Add resource checks to your CI/CD pipeline (fail PRs that deploy without limits)
This month:
- Set up
ResourceQuotaper namespace/team - Monitor CPU throttling and OOMKill events in your metrics stack
- Review limits quarterly as your traffic patterns change
Got a Kubernetes horror story where missing resource limits ruined your weekend? Find me on LinkedIn — misery loves company, and I want to hear it.
Working on your first Kubernetes deployment? Check out my GitHub for real deployment manifests with proper resource configs included from day one.
Now go set those limits before the OOMKiller comes for your favorite pod. 🔪🐳
P.S. The analytics service from my opening story? It turned out to be running an N+1 query loop in a goroutine with no timeout. Resource limits would have contained the blast radius. Proper code review would have prevented it. We got both lessons for the price of one very bad Monday.