0x55aa
← Back to Blog

AWS EventBridge: The Serverless Event Bus That Untangles Your Spaghetti Architecture 🌐⚑

β€’9 min read

AWS EventBridge: The Serverless Event Bus That Untangles Your Spaghetti Architecture 🌐⚑

Hot take: Most "microservices" architectures are just monoliths with network latency added on top. Everyone's drawing boxes and arrows on architecture diagrams, but under the hood, every service is directly calling every other service like it's a dependency injection container on steroids.

I was absolutely guilty of this for the first 18 months of building our serverless e-commerce backend. Order Lambda called Inventory Lambda. Payment Lambda called Notification Lambda. Checkout Lambda called... basically everything. It was a beautiful mess of tightly coupled functions pretending to be independent services.

Then I discovered AWS EventBridge. And everything changed.

What Is EventBridge, Actually? πŸ€”

EventBridge is AWS's fully managed event bus. Think of it as a smart postal router that sits at the center of your architecture:

Order Service         ──event──→  [EventBridge]  ──route──→  Email Service
Payment Service       ──event──→  [EventBridge]  ──route──→  Inventory Service
Checkout Service      ──event──→  [EventBridge]  ──route──→  Analytics Service
                                                 ──route──→  Fraud Detection

Your services stop talking to each other directly. They talk to EventBridge. EventBridge figures out who cares about what and routes accordingly.

The mental model that clicked for me: Lambda calling Lambda is a phone call. EventBridge is a public announcement system. You make the announcement once. Everyone who needs to know, hears it. You don't care who's listening.

Why Your Direct Lambda-to-Lambda Calls Are Hurting You 😬

In production, I've deployed this anti-pattern and lived to regret it:

Checkout Lambda:
  β†’ call Payment Lambda (sync, timeout risk)
  β†’ call Inventory Lambda (sync, timeout risk)
  β†’ call Email Lambda (sync, timeout risk)
  β†’ call Analytics Lambda (sync, timeout risk)
  β†’ call Fraud Detection Lambda (sync, timeout risk)

What this means in practice:

  • Checkout Lambda timeout: 29 seconds (almost hit the limit)
  • If Email Lambda goes down: checkout breaks
  • Adding a new "order confirmed" subscriber: modify Checkout Lambda, redeploy
  • One slow downstream service: entire checkout hangs

This is a distributed monolith. The box on the architecture diagram said "microservices." The code said otherwise.

EventBridge to the Rescue 🦸

A serverless pattern that saved us - restructuring around events:

Checkout Lambda:
  β†’ publish "order.created" event to EventBridge  ← that's it. done.

EventBridge Rules:
  "order.created" β†’ Payment Lambda
  "order.created" β†’ Inventory Lambda
  "order.created" β†’ Email Lambda
  "order.created" β†’ Analytics Lambda
  "order.created" β†’ Fraud Detection Lambda

Checkout Lambda is now blindingly fast. It doesn't wait for anything. It announces the event and moves on. Everything downstream happens asynchronously, independently, in parallel.

Checkout Lambda timeout: 3 seconds.

Setting It Up βš™οΈ

Step 1: Create a custom event bus

aws events create-event-bus --name ecommerce-events

Don't use the default event bus for your application events. Create a named one per domain. Clean separation, cleaner IAM.

Step 2: Publish events from Lambda

import boto3
import json
from datetime import datetime

events = boto3.client('events')

def publish_order_created(order_id, customer_id, total, items):
    events.put_events(
        Entries=[{
            'Source': 'ecommerce.checkout',
            'DetailType': 'order.created',
            'Detail': json.dumps({
                'orderId': order_id,
                'customerId': customer_id,
                'total': total,
                'items': items,
                'timestamp': datetime.utcnow().isoformat()
            }),
            'EventBusName': 'ecommerce-events'
        }]
    )

That's the entire publishing side. 12 lines. Checkout Lambda is done.

Step 3: Define routing rules (CloudFormation)

# Route order.created to email service
OrderCreatedEmailRule:
  Type: AWS::Events::Rule
  Properties:
    EventBusName: ecommerce-events
    EventPattern:
      source:
        - "ecommerce.checkout"
      detail-type:
        - "order.created"
    Targets:
      - Arn: !GetAtt EmailLambda.Arn
        Id: EmailServiceTarget
      - Arn: !GetAtt InventoryLambda.Arn
        Id: InventoryServiceTarget
      - Arn: !GetAtt FraudDetectionLambda.Arn
        Id: FraudDetectionTarget

When architecting on AWS, I learned: Put your routing rules in CloudFormation, not the console. Rules modified in the console are invisible in code reviews. Three months later nobody knows why Email Lambda triggers on order.created but not order.updated. Infrastructure as code or it didn't happen.

Event Patterns: The Smart Routing Part 🧠

EventBridge rules aren't just "all events go everywhere." You can filter with surgical precision:

# Only route high-value orders to fraud detection
HighValueOrderFraudRule:
  Type: AWS::Events::Rule
  Properties:
    EventBusName: ecommerce-events
    EventPattern:
      source:
        - "ecommerce.checkout"
      detail-type:
        - "order.created"
      detail:
        total:
          - numeric: [">", 500]  # Only orders over $500
    Targets:
      - Arn: !GetAtt FraudDetectionLambda.Arn
        Id: FraudDetectionTarget
# Route international orders to compliance Lambda
InternationalOrderRule:
  Type: AWS::Events::Rule
  Properties:
    EventBusName: ecommerce-events
    EventPattern:
      detail-type:
        - "order.created"
      detail:
        country:
          - anything-but: ["US", "CA"]
    Targets:
      - Arn: !GetAtt ComplianceLambda.Arn
        Id: ComplianceTarget

In production, I've deployed 40+ event rules across our e-commerce backend. Each rule is a business decision made explicit in code. "Fraud detection only triggers on high-value orders" β€” that's not buried in application logic, it's visible in the infrastructure.

Gotcha #1: EventBridge Has a Payload Limit πŸ“¦

EventBridge has a 256KB limit per event. This is smaller than you think.

What I did wrong: Putting the entire order object (with all line items, shipping details, product descriptions) in the event payload. A cart with 20 items hits 256KB surprisingly fast.

The pattern that works:

# Wrong: Putting everything in the event
events.put_events(Entries=[{
    'Detail': json.dumps({
        'order': entire_order_object_with_all_the_things  # might explode
    })
}])

# Right: Event as notification, data in S3/DB
events.put_events(Entries=[{
    'Detail': json.dumps({
        'orderId': 'ord_12345',   # just the ID
        'customerId': 'cust_789',
        'total': 299.99,
        'itemCount': 5
        # Downstream services fetch full details from DB
    })
}])

The principle: Events are notifications, not data transfer vehicles. Put identifiers and summary data in the event. Let services fetch the full context they need.

Gotcha #2: Failed Targets Are Silent By Default πŸ”‡

If your Email Lambda fails when processing an order.created event, EventBridge will retry it. But if it keeps failing... EventBridge has a dead-letter queue concept too:

OrderCreatedEmailRule:
  Type: AWS::Events::Rule
  Properties:
    EventBusName: ecommerce-events
    EventPattern:
      detail-type:
        - "order.created"
    Targets:
      - Arn: !GetAtt EmailLambda.Arn
        Id: EmailServiceTarget
        DeadLetterConfig:
          Arn: !GetAtt EventBridgeDLQ.Arn  # Failed events land here
        RetryPolicy:
          MaximumRetryAttempts: 3
          MaximumEventAgeInSeconds: 3600

A serverless pattern that saved us: Treating EventBridge dead-letter queues exactly like SQS DLQs. CloudWatch alarm on DLQ depth > 0. Alert fires. Team investigates. No silent failures.

Gotcha #3: Don't Confuse EventBridge with SNS 🀷

"But I already have SNS for pub/sub!" I hear you. Here's the difference:

SNS EventBridge
Routing Topic-based only Content-based (filter on event data)
Schema registry No Yes
Archive & replay No Yes
SaaS integrations No 90+ (Datadog, PagerDuty, Zendesk...)
Debugging Hard Logs in CloudWatch
Use case Simple fanout Complex event routing

My rule: SNS for simple fanout (same event, all subscribers). EventBridge when you need content-based routing, filtering, or replay.

The Archive and Replay Feature You'll Love After an Incident πŸ”„

EventBridge can archive events and let you replay them:

# Create an archive
aws events create-archive \
  --archive-name ecommerce-events-archive \
  --event-source-arn arn:aws:events:us-east-1:123456789:event-bus/ecommerce-events \
  --retention-days 30

# Replay events from a time range (e.g., after fixing a bug)
aws events start-replay \
  --replay-name fix-email-bug-replay \
  --source-arn arn:aws:events:us-east-1:123456789:archive/ecommerce-events-archive \
  --event-start-time "2026-02-20T00:00:00Z" \
  --event-end-time "2026-02-20T06:00:00Z" \
  --destination '{"Arn": "arn:aws:events:us-east-1:123456789:event-bus/ecommerce-events"}'

When architecting on AWS, I learned: This feature is worth its cost the first time your downstream service has a bug, misses events for 6 hours, and you can replay exactly the events that were lost instead of manually reconstructing state from logs.

Cost Reality Check πŸ’°

EventBridge pricing is genuinely cheap for most applications:

  • Custom events: $1.00 per million events published
  • Schema registry: First 5 million schema-related API calls free, then $0.10/million
  • Event archiving: S3 storage costs only (~$0.023/GB/month)
  • Event replay: $0.10 per million replayed events

My real numbers from production:

E-commerce backend: ~800K events/month
  order.created:     200K events
  order.updated:     150K events
  payment.captured:  200K events
  inventory.updated: 250K events

Monthly EventBridge cost: ~$0.80
Monthly value of not breaking checkout for unrelated service failures: πŸ’°πŸ’°πŸ’°

EventBridge is one of the cheapest architectural wins you can get on AWS.

Adding New Services Without Touching Existing Code πŸš€

This is the "aha moment" that sold me completely.

Scenario: Six months in, we need to add a loyalty points service. Every order should earn points.

Before EventBridge:

  1. Find Checkout Lambda
  2. Add call to Loyalty Lambda
  3. Test Checkout Lambda (risk regression)
  4. Deploy Checkout Lambda (deployment risk)
  5. Wire timeout handling, error handling

With EventBridge:

  1. Create Loyalty Lambda
  2. Add EventBridge rule: order.created β†’ Loyalty Lambda
  3. Deploy rule

Checkout Lambda never changed. Never knew Loyalty Lambda existed. Zero risk to the checkout flow.

In production, I've deployed 7 new downstream services to our order event stream this way. None of them required touching the publisher. That's true decoupling.

The Architecture That Actually Works πŸ—οΈ

[Checkout Lambda]
      β”‚
  "order.created"
      β”‚
[EventBridge: ecommerce-events]
   /  β”‚  β”‚  β”‚  \
  ↓   ↓  ↓  ↓   ↓
Email Inv Pay  Fraud Analytics
 Ξ»    Ξ»   Ξ»    Ξ»     Ξ»
       β”‚
      DLQ (with CloudWatch alarm)

All decoupled. All independent. All observable. Adding the seventh box doesn't require touching boxes 1-6.

TL;DR ⚑

Direct Lambda-to-Lambda calls = tightly coupled, slow, fragile.

AWS EventBridge = publish once, route to many, add subscribers without changing publishers.

Your migration checklist:

  1. Identify service-to-service calls that don't need synchronous responses
  2. Create a named event bus per domain (ecommerce-events, auth-events)
  3. Replace downstream calls with events.put_events()
  4. Define routing rules in CloudFormation
  5. Add DLQ + CloudWatch alarm for every target
  6. Enable event archiving (30-day retention minimum)
  7. Test replay works before you need it in production

A serverless pattern that saved us: After migrating our checkout flow to EventBridge, our checkout Lambda timeout dropped from 29 seconds to 3 seconds. Conversion rate on mobile (where 29s timeouts are brutal) went up 12%. The architecture diagram finally matched how the system actually worked.

That's worth more than a week of optimization work. 🎯


Already using EventBridge in production? I'd love to compare notes on LinkedIn β€” especially if you've done cross-account event routing. That's a post for another day. πŸ˜„

Want to see the full event schema library I've built? Check GitHub for event-driven serverless patterns from a real e-commerce backend.

Go decouple something. Your future 2 AM self will thank you. 🌐⚑


P.S. EventBridge also has Pipes (point-to-point integrations with filtering and enrichment) and Scheduler (cron jobs without servers). If you're using CloudWatch Events for scheduling Lambda functions β€” EventBridge Scheduler is the modern replacement with better timezone support, retry handling, and no more "why is this cron syntax different from every other cron syntax" confusion. πŸ“