Kubernetes Resource Limits: Stop Letting Your Pods Starve (or Eat Everything) ðģâïļ
Kubernetes Resource Limits: Stop Letting Your Pods Starve (or Eat Everything) ðģâïļ
Picture this: It's 2 AM. Your on-call phone is screaming. You log into the cluster and see the most terrifying Kubernetes output imaginable:
OOMKilled Exit Code: 137
Your payment service is dead. Not because of a code bug â but because a reporting job you deployed "temporarily" decided to gorge on every megabyte of RAM in the node. Every other pod got evicted. Chaos. Carnage. One very grumpy engineering manager at breakfast.
Welcome to the world of Kubernetes resource management. Get it wrong and your cluster is a lawless free-for-all. Get it right and you'll sleep through the night like a baby. ðī
Let me show you how to get it right.
The Two Things Kubernetes Needs to Know ðĪ
Kubernetes schedules your pods across nodes based on one question: "Where can this pod fit?"
To answer that, it needs two values per container:
- Requests: "This is the minimum I need to run." (Used for scheduling decisions)
- Limits: "This is the maximum I'm allowed to use." (Hard cap â enforced at runtime)
Think of it like booking a hotel room:
- Request = the room you reserved
- Limit = the physical size of the room (you can't knock down walls)
No reservation? Good luck. The scheduler will cram your pod onto whatever node has space â even a node that's already overwhelmed. And no limit? Your pod can eat the entire buffet while other guests starve. ð―ïļ
A Dead-Simple Example Before the Brain Melts ð§
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: myapp:latest
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
Breaking down the numbers:
250mCPU = 250 millicores = 0.25 of a CPU core128Mimemory = 128 mebibytes (roughly 134 MB)- The pod is guaranteed 250m CPU and 128Mi RAM on whatever node it lands on
- The pod cannot exceed 500m CPU or 256Mi RAM â ever
CPU limits: the pod gets throttled (slowed down) if it tries to go over. Memory limits: the pod gets killed (OOMKilled) if it exceeds. No mercy. ð
The Horror Story That Made This Click ðą
We had a microservices setup on AWS EKS â about 12 services. Everything was humming along. Then one Friday afternoon a developer deployed a "quick analytics job" with no resource limits:
# The original deployment (a crime against clusters)
containers:
- name: analytics-job
image: analytics:latest
# No resources block. Completely naked. YOLO.
What happened over the next two hours:
- Analytics job started, decided it wanted ALL the memory
- Kubernetes didn't care â no limits, no problem (apparently)
- The node hit 95% memory pressure
- Kubernetes started evicting lower-priority pods to make space
- Our auth service got evicted
- Users couldn't log in
- 404s cascading everywhere
- One very expensive emergency rollback
Root cause: One pod, no limits, total cluster chaos.
Fix: Namespace-level LimitRange objects so no pod can ever be deployed without sensible defaults again. More on that in a second.
Setting Limits That Actually Make Sense ðŊ
The hardest part isn't the YAML â it's knowing what numbers to put. Here's my practical approach:
Step 1: Profile first, guess never
Deploy your app without limits initially in a staging environment and watch it:
# Watch resource usage in real time
kubectl top pods -n your-namespace --sort-by=memory
# Get detailed metrics
kubectl top pods -n your-namespace --containers
Or use a quick Prometheus query if you have monitoring set up:
# 95th percentile memory usage for your pod over 7 days
histogram_quantile(0.95,
rate(container_memory_working_set_bytes{pod=~"api-server-.*"}[5m])
)
Run realistic load tests. See what the pod actually uses at P95 traffic. That's your request. Set your limit at 2x the request to give breathing room without enabling runaway consumption.
Step 2: Set namespace defaults so nobody can forget
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default:
memory: "256Mi"
cpu: "500m"
defaultRequest:
memory: "128Mi"
cpu: "100m"
max:
memory: "2Gi"
cpu: "2"
min:
memory: "32Mi"
cpu: "50m"
What this does:
- Every container that doesn't specify resources gets
128Mi/100mrequests and256Mi/500mlimits automatically - No container can request more than
2Gimemory or2CPUs - Prevents the "I'll just not set limits" shortcut that ends careers at 2 AM ð
Step 3: Use ResourceQuota to cap the whole namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
This is your circuit breaker. Even if someone tries to deploy 200 replicas of their analytics job, Kubernetes will refuse. The namespace simply can't exceed the quota. No more cluster-eating rogue deployments. ðĄïļ
Quality of Service Classes: Kubernetes' Priority System ð
Here's a thing most people don't know: Kubernetes automatically assigns a QoS class to every pod based on its resource settings. This determines who gets evicted first when the node runs out of resources.
| QoS Class | When Assigned | Eviction Priority |
|---|---|---|
| Guaranteed | requests == limits for ALL containers | Evicted last (the VIPs) |
| Burstable | requests < limits (or only some set) | Middle of the pack |
| BestEffort | No requests or limits set | Evicted FIRST (the sacrificial lambs) |
Your payment service? Should be Guaranteed. Same request and limit values.
# QoS: Guaranteed â won't be evicted under pressure
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi" # Same as request!
cpu: "500m" # Same as request!
Your background batch job? BestEffort is fine â it can be killed and retried. But for anything customer-facing, you want Burstable at minimum, Guaranteed for critical paths.
The Gotchas That Will Bite You ðŠĪ
Gotcha #1: CPU throttling is silent.
If a pod hits its CPU limit, it doesn't crash â it just slows down. Your latency creeps up, P99 response times spike, and you spend two hours looking at application code before someone notices the CPU throttle metric. Always watch container_cpu_throttled_seconds_total in your dashboards.
Gotcha #2: Memory limits kill without warning.
Unlike CPU, hitting the memory limit is instant death (OOMKilled). JVM apps are especially sneaky here â the JVM doesn't respect container limits by default in older versions. Always set -XX:MaxRAMPercentage=75 or similar to keep the JVM honest inside the container.
Gotcha #3: Requests too high = wasted capacity.
If you set requests.memory: "2Gi" but your app only uses 200Mi, you've reserved 2GB on a node that other pods could have used. Requests that are too high are just as harmful as no limits at all â you're wasting cluster capacity and paying for nodes you don't need.
Your Action Plan (Do This Today) ð
Right now:
- Run
kubectl top pods -Aand find pods using more than 500Mi RAM with no limits - Add a
LimitRangeto every namespace that doesn't have one - Identify your most critical pods and set them to
GuaranteedQoS
This week:
- Set up Prometheus alerts for
OOMKilledpods and CPU throttling > 25% - Review all deployments and add sensible resource blocks based on observed usage
- Apply
ResourceQuotato production and staging namespaces
This month:
- Run load tests against staging to establish accurate baselines
- Consider Vertical Pod Autoscaler (VPA) in recommendation mode â it watches your pods and suggests better values automatically
- Document your resource sizing philosophy so the whole team is on the same page
The cluster is a shared resource. Treat it like one. When you set requests and limits thoughtfully, every service gets a fair shot, evictions become rare, and your on-call rotations stop being a nightmare fuel factory.
Your future self (and your teammates) will thank you. ð
Struggling with Kubernetes resource tuning? Find me on LinkedIn â happy to talk through your cluster setup.
Want to see real-world resource configs? Check out my GitHub for production-tested Kubernetes manifests.
Now go set those limits before the analytics job strikes again. âïļðģ