AWS Lambda Concurrency: Your 'Infinite Scale' Has a Speed Limit âĄð§
AWS Lambda Concurrency: Your 'Infinite Scale' Has a Speed Limit âĄð§
Here's a thing that happened to me: Flash sale. Product goes viral on social media. 50,000 users hit our e-commerce site simultaneously. The serverless architecture I'd spent months building was supposed to handle this effortlessly.
Instead, our checkout endpoint started returning HTTP 429s. Then 500s. Orders failing. Customers angry. Me at 3 AM refreshing the AWS console wondering what "ThrottlingException" meant and why my "infinitely scalable" Lambda was apparently very, very finite.
That was the day I learned that Lambda has concurrency limits. And they're not optional. They're just... there. Waiting for you.
Lambda Concurrency 101 ð
When 1,000 users hit your Lambda simultaneously, Lambda spins up 1,000 instances running in parallel. That's concurrency - the number of Lambda instances running at the same time.
Request 1 â [Lambda Instance #1] â Done â
Request 2 â [Lambda Instance #2] â Done â
...
Request 1000 â [Lambda Instance #1000] â Done â
Request 1001 â â THROTTLED
The number nobody tells you: Your AWS account has a default total concurrency limit of 1,000 across ALL your Lambda functions, in ALL your services, in a given region.
All 1,000 slots being used? New invocations get throttled. The error looks like this:
{
"errorType": "TooManyRequestsException",
"errorMessage": "Rate exceeded",
"statusCode": 429
}
Translation: "You've hit the ceiling. Requests are being rejected." ð
The Math That Will Haunt You ð§Ū
In production, I've deployed about 40 Lambda functions across our e-commerce backend. Here's how fast the math gets painful:
Product search Lambda: max concurrency ~150
Checkout Lambda: max concurrency ~100
Order processing Lambda: max concurrency ~80
Payment webhook Lambda: max concurrency ~50
Image resizer Lambda: max concurrency ~200
...
During Black Friday? Every one of those spikes simultaneously. Without limits configured, one runaway function can eat all 1,000 slots and starve everything else.
The scenario I actually lived:
Our image resizing Lambda got hammered (CDN cache miss storm). It consumed 800 concurrent slots. Our checkout Lambda - which needed 50 slots to handle the sale traffic - got 200 slots and started throttling. Revenue stopped. Images were fast. Nobody cared about the images.
Reserved Concurrency: The Traffic Cop ðĶ
Reserved Concurrency does two things at once:
- Guarantees a function always has at least N slots available
- Caps a function at no more than N slots (protecting other functions)
# CloudFormation
CheckoutFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: checkout
ReservedConcurrentExecutions: 200 # Never more, never less available
# CLI
aws lambda put-function-concurrency \
--function-name checkout \
--reserved-concurrent-executions 200
A serverless pattern that saved us: After the Black Friday incident, I set reserved concurrency on every revenue-critical function. Checkout: 200. Payment webhooks: 100. Order processing: 150. The image resizer? Capped at 100. It can wait.
The catch: Reserved concurrency reduces your shared pool. If you reserve 500 across functions, you have 500 left for everything else. Plan accordingly or request a limit increase.
Provisioned Concurrency: The Cold Start Killer âïļ
You've read about cold starts. Provisioned Concurrency is the expensive but effective cure.
Without Provisioned Concurrency:
New request â Lambda needs to init â ~800ms cold start â Request handled
With Provisioned Concurrency (say, 50 instances pre-warmed):
Request 1-50 â Pre-warmed instance â ~5ms â
Request 51 â Cold start needed â ~800ms âģ
CheckoutFunctionAlias:
Type: AWS::Lambda::Alias
Properties:
FunctionName: !Ref CheckoutFunction
Name: production
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 50
When architecting on AWS, I learned: Provisioned Concurrency is priced differently from regular Lambda. You pay for the pre-warmed instances even when they're idle. For our checkout function, 50 provisioned instances cost ~$43/month. During the flash sale, they paid for themselves in recovered orders in the first 10 minutes.
Use it for: User-facing APIs where cold start latency matters. Not for background jobs that run at 2 AM when nobody's watching.
The SQS Throttling Trap ðŠĪ
This one is sneaky and burns everyone eventually.
Setup: SQS queue â Lambda function with reserved concurrency of 100.
What you think happens: Lambda scales up to 100 concurrent instances, processes the queue.
What actually happens when the queue spikes:
Queue depth: 10,000 messages
Lambda: "Let me spin up 1,000 instances!"
Reserved concurrency: "No, 100 maximum."
1,000 - 100 = 900 invocations throttled â go back to queue
Throttled SQS invocations don't go to your DLQ. They just retry. This creates a retry storm that makes your CloudWatch graphs look like a seismograph during an earthquake.
The fix: Size your SQS batch settings to match your reserved concurrency:
CheckoutQueueEventSource:
Type: AWS::Lambda::EventSourceMapping
Properties:
FunctionName: !GetAtt CheckoutFunction.Arn
EventSourceArn: !GetAtt CheckoutQueue.Arn
BatchSize: 10
MaximumConcurrency: 100 # Match your reserved concurrency!
MaximumConcurrency on the event source mapping tells SQS to never invoke more Lambda instances than this. No retry storms. Clean scaling.
Monitoring Concurrency (Before It Kills You) ð
These are the CloudWatch metrics that matter:
# See current concurrency usage
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name ConcurrentExecutions \
--dimensions Name=FunctionName,Value=checkout \
--start-time 2026-02-20T00:00:00Z \
--end-time 2026-02-20T23:59:59Z \
--period 300 \
--statistics Maximum
The alarm I add to every revenue-critical function:
CheckoutThrottleAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: checkout-lambda-throttles
MetricName: Throttles
Namespace: AWS/Lambda
Dimensions:
- Name: FunctionName
Value: checkout
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 5
ComparisonOperator: GreaterThanOrEqualToThreshold
AlarmActions:
- !Ref OnCallSNSTopic
Any throttles on checkout â immediate alert. I want to know before a customer tweets about it.
Requesting Limit Increases ðŽ
The 1,000 default is a soft limit. You can request increases through AWS Support:
AWS Console â Support â Create Case â Service Limit Increase
Service: Lambda
Limit: Concurrent Executions
Region: us-east-1
New limit value: 5,000
My experience: Standard increases (up to 3,000) get approved in a few hours. Larger increases need a business justification. For production e-commerce with revenue numbers to share, they're usually approved.
When architecting on AWS, I learned: Request the limit increase before the flash sale, not during it. AWS Support is fast. AWS Support at 3 AM on Black Friday while you're losing $10K/hour is still not instant.
Cost Reality Check ð°
Regular Lambda invocations: $0.20 per 1M requests + $0.0000166667 per GB-second.
Provisioned Concurrency: $0.015 per GB-hour (pre-warmed) + reduced invocation cost.
My actual numbers (checkout Lambda, 50 provisioned instances, 512MB memory):
Provisioned: 50 à 0.5 GB à 720 hours à $0.015 = $270/month
Invocations: ~2M/month à $0.20/M = $0.40
Total: ~$270/month
That sounds like a lot. Until you realize the alternative is cold-starting every checkout request at 800ms and losing 15% of users who abandon slow checkouts. Revenue math wins.
The cost-saving trick: Use Application Auto Scaling to schedule provisioned concurrency:
# Scale up before business hours, down at night
ScalableTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
ResourceId: !Sub "function:checkout:production"
ScalableDimension: lambda:function:ProvisionedConcurrency
MinCapacity: 5
MaxCapacity: 100
Keep 5 instances warm at 3 AM, 100 during business hours. We cut provisioned concurrency costs by 60% with scheduled scaling.
Common Pitfalls Quick Reference ðŠĪ
| Mistake | What Happens | Fix |
|---|---|---|
| No reserved concurrency | One function eats all 1,000 slots | Set reserved concurrency on every function |
| No alarm on Throttles metric | Silent failures, lost requests | CloudWatch alarm on Throttles > 0 |
| SQS without MaximumConcurrency | Retry storm, queue backlog explodes | Match event source concurrency to reserved limit |
| Provisioned concurrency on all functions | $$$, unnecessary for background jobs | Only for user-facing, latency-sensitive functions |
| Not requesting limit increase pre-event | Hitting 1,000 limit during peak | Request increases 2 weeks before any large event |
| Concurrency too low for DLQ pattern | DLQ fills up faster than expected | Account for retries in your concurrency budget |
The Setup That Actually Works ðïļ
In production, I've deployed this structure for every critical serverless workflow:
[User Request]
â
[API Gateway]
â
[Lambda - Reserved: 200, Provisioned: 50] â Revenue critical
â
[SQS Queue - MaximumConcurrency: 150]
â
[Lambda - Reserved: 150] â Async processing
â
[DLQ]
â
[CloudWatch Alarm]
Everything has limits. Everything is monitored. Nothing eats the shared pool.
TL;DR âĄ
Lambda is NOT infinitely scalable by default. Your account has 1,000 concurrent executions per region. Shared across all functions. Use them wisely.
Your checklist:
- Set reserved concurrency on every production function
- Match SQS event source
MaximumConcurrencyto reserved concurrency - Use provisioned concurrency only for user-facing, latency-sensitive APIs
- Schedule provisioned concurrency scaling (save 40-60% on costs)
- Alarm on
Throttlesmetric - non-zero is a problem - Request limit increases before peak events, not during them
- Monitor
ConcurrentExecutionsregularly, not just in emergencies
A serverless pattern that saved us: After setting reserved concurrency on all 40+ functions and configuring proper SQS event source limits, our Black Friday the following year was flawless. 80,000 concurrent users. Zero throttles. Zero 3 AM pages. I went to sleep at 11 PM. First time ever.
That's what proper concurrency configuration feels like. ðïļ
Got burned by Lambda throttling? I'd love to commiserate on LinkedIn - everyone has a 3 AM Lambda war story and they're all spectacular. ð
More production serverless patterns? Check GitHub for real-world examples from an e-commerce backend that's been battle-tested.
Go set those concurrency limits. Before your flash sale does it for you. âĄð§
P.S. Lambda has a separate burst limit that's even lower - it's region-dependent but typically 3,000 initial burst with 500 new instances per minute after that. So even with a 5,000 account limit, you can't spin up 5,000 instances instantly. Plan for gradual scaling if your traffic spikes suddenly. The burst limit is the one that will catch you if your account limit doesn't. ðĒ