CloudFormation vs Terraform: The IaC Battle I Wish Someone Explained to Me ☁️⚔️
CloudFormation vs Terraform: The IaC Battle I Wish Someone Explained to Me ☁️⚔️
Real talk: The first time I deployed infrastructure on AWS, I was a console cowboy. Click, click, create VPC, click, launch EC2, click, configure security group... 47 clicks later, I had a working environment! 🎉
Then my boss asked, "Can you recreate this in staging?" Me: "Uhh... let me write down all the settings?" Narrator: That did NOT go well. 😅
Welcome to Infrastructure as Code - the skill that separates senior engineers from people who still manually click "Launch Instance" in production!
What Even Is Infrastructure as Code? 🤔
IaC = Writing code to manage infrastructure instead of clicking buttons
Think of it like this:
Manual approach (Console Clicking):
You → AWS Console → Click "Create VPC" → Fill form → Click 47 more times
→ Hope you remember what you did
→ Pray you don't fat-finger a setting
→ Good luck recreating this! 💀
IaC approach:
You → Write vpc.tf → Run terraform apply → Infrastructure created!
→ Version controlled in Git
→ Reproducible forever
→ Sleep well at night 😴
Translation: IaC is treating your infrastructure like application code - versioned, tested, repeatable!
The Wake-Up Call That Taught Me IaC 📞
When architecting our e-commerce backend, I manually created everything in the AWS Console. Here's what happened:
The disaster timeline:
Week 1: Spent 6 hours clicking through AWS Console setting up production VPC, subnets, RDS, Lambda functions, API Gateway, CloudFront... 🖱️
Week 2: Boss: "We need a staging environment that matches production." Me: opens Console "This will take a while..." 😰
Week 3: Created staging environment. Forgot to add NAT Gateway in one subnet. Lambda functions couldn't reach the internet. Debugging took 4 hours! 🐛
Week 4: Junior dev accidentally deleted a security group in production (Console makes it too easy!). Site went down. We had NO record of what the settings were! 💥
Week 5: Me: "There has to be a better way!" Googles "Infrastructure as Code" 🔍
Week 6: Converted everything to Terraform. Now I can:
- Recreate entire infrastructure in 5 minutes ✅
- See exactly what changed in Git diffs ✅
- Roll back bad changes instantly ✅
- Never lose sleep over "what was that setting again?" ✅
In production, I've deployed hundreds of AWS resources using IaC. Never going back to console clicking! 🚀
CloudFormation: The AWS Native Way 📜
What it is: AWS's own IaC tool. Uses JSON or YAML templates to define infrastructure.
Why I started with CloudFormation:
# vpc.yaml - CloudFormation template
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Production VPC'
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: production-vpc
PublicSubnet:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: public-subnet-1a
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: production-igw
AttachGateway:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
Deploy it:
aws cloudformation create-stack \
--stack-name production-vpc \
--template-body file://vpc.yaml
What happens:
- CloudFormation reads the template
- Creates resources in correct order (VPC → Subnet → IGW)
- Tracks all resources in a "stack"
- Updates or deletes everything together
What I loved:
- ✅ Native AWS integration (works perfectly with all AWS services)
- ✅ No extra tools to install (comes with AWS)
- ✅ Free! (no additional cost)
- ✅ Automatic rollback on failure
- ✅ IAM integration is seamless
What drove me crazy:
- ❌ YAML/JSON syntax is verbose and clunky
- ❌ Error messages are cryptic AF ("Resource failed to create" - WHY?!)
- ❌ Only works with AWS (can't manage GitHub, Cloudflare, Datadog, etc.)
- ❌ No plan preview (CloudFormation change sets are painful)
- ❌ State management is implicit (sometimes confusing)
Terraform: The Swiss Army Knife 🛠️
What it is: HashiCorp's multi-cloud IaC tool. Uses HCL (HashiCorp Configuration Language).
Why I switched to Terraform:
# vpc.tf - Terraform config
provider "aws" {
region = "us-east-1"
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "production-vpc"
}
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = data.aws_availability_zones.available.names[0]
map_public_ip_on_launch = true
tags = {
Name = "public-subnet-1a"
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "production-igw"
}
}
Deploy it:
terraform init # Download AWS provider
terraform plan # Preview changes (AMAZING!)
terraform apply # Create infrastructure
What I loved:
- ✅ HCL syntax is clean and readable
- ✅
terraform planshows EXACTLY what will change before applying - ✅ Works with 3000+ providers (AWS, GCP, Azure, GitHub, Cloudflare, etc.)
- ✅ State file makes dependencies explicit
- ✅ Amazing error messages
- ✅ Huge community and module ecosystem
What annoyed me:
- ❌ Extra tool to install and learn
- ❌ State file management (can be tricky in teams)
- ❌ Sometimes lags behind new AWS features
- ❌ Community modules vary in quality
The Head-to-Head Comparison ⚔️
Let me show you REAL production scenarios:
Scenario 1: Creating an S3 Bucket
CloudFormation:
Resources:
MyBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: my-production-bucket
VersioningConfiguration:
Status: Enabled
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
LifecycleConfiguration:
Rules:
- Id: DeleteOldVersions
Status: Enabled
NoncurrentVersionExpirationInDays: 30
Terraform:
resource "aws_s3_bucket" "main" {
bucket = "my-production-bucket"
}
resource "aws_s3_bucket_versioning" "main" {
bucket = aws_s3_bucket.main.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_public_access_block" "main" {
bucket = aws_s3_bucket.main.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_lifecycle_configuration" "main" {
bucket = aws_s3_bucket.main.id
rule {
id = "delete-old-versions"
status = "Enabled"
noncurrent_version_expiration {
noncurrent_days = 30
}
}
}
Winner: CloudFormation - More concise for AWS-only resources!
Scenario 2: Creating Lambda with Dependencies
CloudFormation:
Resources:
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
MyLambda:
Type: AWS::Lambda::Function
Properties:
FunctionName: my-function
Runtime: nodejs18.x
Handler: index.handler
Code:
ZipFile: |
exports.handler = async (event) => {
return { statusCode: 200, body: 'Hello!' };
};
Role: !GetAtt LambdaExecutionRole.Arn
Terraform:
resource "aws_iam_role" "lambda" {
name = "lambda-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_role_policy_attachment" "lambda_basic" {
role = aws_iam_role.lambda.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}
resource "aws_lambda_function" "main" {
filename = "lambda.zip"
function_name = "my-function"
role = aws_iam_role.lambda.arn
handler = "index.handler"
runtime = "nodejs18.x"
}
Winner: Tie - Both are reasonable!
Scenario 3: Multi-Cloud Setup (AWS + Cloudflare DNS)
CloudFormation:
# Can't do it! CloudFormation only supports AWS! 😭
Terraform:
# AWS resources
provider "aws" {
region = "us-east-1"
}
resource "aws_s3_bucket" "website" {
bucket = "my-website"
}
resource "aws_cloudfront_distribution" "website" {
origin {
domain_name = aws_s3_bucket.website.bucket_regional_domain_name
origin_id = "S3-my-website"
}
# ... more config
}
# Cloudflare DNS
provider "cloudflare" {
api_token = var.cloudflare_token
}
resource "cloudflare_record" "website" {
zone_id = var.cloudflare_zone_id
name = "www"
value = aws_cloudfront_distribution.website.domain_name
type = "CNAME"
proxied = true
}
# GitHub repo webhooks
provider "github" {
token = var.github_token
}
resource "github_repository_webhook" "deploy" {
repository = "my-website"
events = ["push"]
configuration {
url = aws_lambda_function.deploy.function_url
content_type = "json"
}
}
Winner: Terraform - CloudFormation can't even compete here! 🏆
Scenario 4: Preview Changes Before Applying
CloudFormation:
# Create change set (clunky!)
aws cloudformation create-change-set \
--stack-name my-stack \
--template-body file://template.yaml \
--change-set-name my-changes
# Wait for it to process...
aws cloudformation wait change-set-create-complete \
--stack-name my-stack \
--change-set-name my-changes
# View the change set (hard to read!)
aws cloudformation describe-change-set \
--stack-name my-stack \
--change-set-name my-changes
# Output:
# {
# "Changes": [
# {
# "Type": "Resource",
# "ResourceChange": {
# "Action": "Modify",
# "LogicalResourceId": "MyBucket",
# "ResourceType": "AWS::S3::Bucket",
# "Replacement": "False"
# }
# }
# ]
# }
# "What exactly is changing?!" 🤔
Terraform:
terraform plan
# Output (beautiful and clear!):
# Terraform will perform the following actions:
#
# # aws_s3_bucket.main will be updated in-place
# ~ resource "aws_s3_bucket" "main" {
# bucket = "my-bucket"
# ~ versioning {
# ~ enabled = false -> true
# }
# }
#
# Plan: 0 to add, 1 to change, 0 to destroy.
Winner: Terraform - Plan preview is a GAME CHANGER! 🎯
When I Use CloudFormation vs Terraform 🤔
After 7 years of AWS deployments, here's my decision tree:
Use CloudFormation When:
✅ AWS-only infrastructure
# Simple Lambda + API Gateway
# CloudFormation has great AWS integration!
✅ Serverless Framework / SAM
# Both use CloudFormation under the hood
serverless deploy
sam deploy
✅ AWS Service Catalog
# Your company mandates CloudFormation templates
# (compliance/governance reasons)
✅ CDK (Cloud Development Kit)
// Write in TypeScript, compiles to CloudFormation!
const bucket = new s3.Bucket(this, 'MyBucket', {
versioned: true
});
My CloudFormation projects:
- Serverless APIs (Serverless Framework)
- AWS CDK infrastructure
- Simple single-service deployments
Use Terraform When:
✅ Multi-cloud infrastructure
# AWS + GCP + Cloudflare + Datadog
✅ Complex infrastructure with modules
# Reusable VPC module used across 10 projects
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
# ...
}
✅ Need plan preview before every change
terraform plan # See before you destroy! 🛡️
✅ Managing non-AWS resources
# GitHub repos, Cloudflare DNS, PagerDuty, etc.
My Terraform projects:
- Production VPC infrastructure
- Multi-account AWS setups
- Infrastructure spanning AWS + Cloudflare + GitHub
- Anything I want tight control over
The Hybrid Approach (What I Actually Do) 🎯
Plot twist: I use BOTH in production!
Architecture:
My Production Infrastructure:
├── Terraform (Core Infrastructure)
│ ├── VPC, Subnets, Route Tables
│ ├── RDS Databases
│ ├── ElastiCache Clusters
│ ├── IAM Roles and Policies
│ ├── S3 Buckets and Policies
│ └── CloudFront Distributions
│
└── CloudFormation (Applications)
├── Serverless Framework (API endpoints)
├── SAM (Lambda functions)
└── CDK (Event-driven workflows)
Why this works:
Terraform = Foundation (changes rarely, managed carefully)
- VPCs don't change often
- Database infrastructure is stable
- Perfect for
terraform planworkflow
CloudFormation = Applications (deployed frequently)
- Lambda functions update daily
- Serverless Framework makes deployments easy
- SAM templates are simple and fast
When architecting on AWS, I learned: Use the right tool for the job! Terraform for infrastructure, CloudFormation for serverless apps! 🏗️
Common IaC Mistakes I Made (Learn From My Pain!) 🪤
Mistake #1: Not Using Remote State
Bad (local state):
# terraform.tfstate stored locally
# Team member makes changes
# State conflicts! 💥
Good (remote state):
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "production/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks" # Prevents concurrent changes!
}
}
Lesson learned: Remote state in S3 + DynamoDB locking from DAY ONE! 🔒
Mistake #2: Hardcoding Values
Bad:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0" # Hardcoded!
instance_type = "t3.micro"
key_name = "my-laptop-key" # Works on my machine!
}
Good:
variable "ami_id" {
description = "AMI ID for web server"
type = string
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
}
resource "aws_instance" "web" {
ami = var.ami_id
instance_type = var.instance_type
key_name = aws_key_pair.deploy.key_name
}
Lesson learned: Variables make infrastructure reusable! 🎯
Mistake #3: No Tagging Strategy
Bad:
resource "aws_instance" "web" {
# No tags - good luck finding this in AWS Console!
}
Good:
locals {
common_tags = {
Environment = "production"
Project = "ecommerce"
ManagedBy = "terraform"
CostCenter = "engineering"
}
}
resource "aws_instance" "web" {
# ...
tags = merge(local.common_tags, {
Name = "web-server-1"
Role = "frontend"
})
}
Lesson learned: Tags = cost tracking + organization + sanity! 🏷️
Mistake #4: Not Using Modules
Bad (copy-paste everywhere):
# vpc-dev.tf
resource "aws_vpc" "dev" { ... }
resource "aws_subnet" "dev_public_1a" { ... }
resource "aws_subnet" "dev_public_1b" { ... }
# ... 50 more lines
# vpc-staging.tf
resource "aws_vpc" "staging" { ... } # Copy-paste!
resource "aws_subnet" "staging_public_1a" { ... }
# ... same 50 lines
Good (reusable module):
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
# ...
}
# environments/dev.tf
module "dev_vpc" {
source = "../modules/vpc"
cidr_block = "10.0.0.0/16"
env = "dev"
}
# environments/staging.tf
module "staging_vpc" {
source = "../modules/vpc"
cidr_block = "10.1.0.0/16"
env = "staging"
}
Lesson learned: Modules = DRY principle for infrastructure! 🔁
Mistake #5: Not Testing Changes
Bad workflow:
# Just YOLO it!
terraform apply --auto-approve
# "What could go wrong?" 😱
Good workflow:
# Always plan first!
terraform plan -out=tfplan
# Review the plan
cat tfplan # Read EVERYTHING
# Apply only if plan looks good
terraform apply tfplan
# Monitor AWS Console during apply
# Watch CloudWatch metrics
In production, I've learned: terraform plan has saved me from destroying production databases at least 5 times! 🛡️
The Decision Matrix: Which Should You Learn? 🎓
Learn CloudFormation if:
- ✅ You're AWS-only (no multi-cloud plans)
- ✅ You use Serverless Framework or SAM
- ✅ You want zero external dependencies
- ✅ Your team already uses it
Learn Terraform if:
- ✅ You want to learn once, use everywhere (multi-cloud)
- ✅ You need
terraform planworkflow - ✅ You value community modules
- ✅ You're starting fresh (most modern choice)
Learn BOTH if:
- ✅ You're a professional DevOps/Cloud engineer (like me!)
- ✅ You want maximum flexibility
- ✅ You're serious about infrastructure
My recommendation: Start with Terraform (more transferable skills), learn CloudFormation when you need Serverless Framework! 🚀
Quick Start: Your First Terraform Project ✅
Ready to ditch console clicking? Here's your starter template:
1. Install Terraform:
# macOS
brew install terraform
# Linux
wget https://releases.hashicorp.com/terraform/1.7.0/terraform_1.7.0_linux_amd64.zip
unzip terraform_1.7.0_linux_amd64.zip
sudo mv terraform /usr/local/bin/
2. Create main.tf:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
resource "aws_s3_bucket" "my_first_bucket" {
bucket = "my-unique-bucket-name-12345"
tags = {
Name = "My First Terraform Bucket"
Environment = "learning"
}
}
output "bucket_name" {
value = aws_s3_bucket.my_first_bucket.bucket
}
3. Deploy:
terraform init # Download AWS provider
terraform plan # Preview changes
terraform apply # Create the bucket!
4. Verify:
aws s3 ls | grep my-unique-bucket
# You'll see your bucket! 🎉
5. Destroy (cleanup):
terraform destroy # Remove everything
That's it! You just managed AWS infrastructure as code! 🏆
The Bottom Line 💡
IaC isn't just "best practice" - it's the difference between professional infrastructure and chaos!
The essentials:
- Version control everything (infrastructure = code)
- Automate deployments (no more console clicking)
- Use remote state (team collaboration)
- Plan before apply (prevent disasters)
- Tag everything (cost tracking + organization)
The truth about IaC:
It's not "CloudFormation vs Terraform" - it's "clicking buttons like a caveman vs managing infrastructure like a pro!" Both tools are great. Pick one and LEARN IT!
When architecting our serverless backends, I learned this: Console clicking doesn't scale. Teams need reproducible infrastructure. Git history shows who changed what. IaC is mandatory for professional cloud engineering! And honestly? Terraform's plan command has saved my ass more times than I can count! 🙏
You don't need perfect IaC from day one - you need to START using it and iterate! 🚀
Your Action Plan 🎯
This week:
- Stop creating resources in AWS Console (seriously, stop!)
- Install Terraform or set up AWS CLI for CloudFormation
- Convert ONE resource to IaC (start with an S3 bucket)
- Commit your .tf or .yaml file to Git
This month:
- Convert your entire dev environment to IaC
- Set up remote state backend (S3 + DynamoDB)
- Create reusable modules for common patterns
- Write deployment documentation for your team
This quarter:
- Migrate production to IaC (carefully!)
- Implement CI/CD for infrastructure deployments
- Set up cost monitoring with proper tagging
- Become the IaC guru on your team! 🏆
Resources Worth Your Time 📚
Tools I use daily:
- Terraform - Multi-cloud IaC
- Terragrunt - Terraform wrapper for DRY configs
- AWS CDK - Infrastructure in TypeScript/Python
Reading list:
- Terraform: Up & Running - Best Terraform book
- AWS CloudFormation Templates - Official examples
Real talk: The best IaC tool is the one you actually USE! Stop debating, start building!
Still clicking buttons in AWS Console? Connect with me on LinkedIn and share your IaC journey!
Want to see my Terraform modules? Check out my GitHub - production-ready VPC, Lambda, and RDS modules!
Now go forth and codify your infrastructure! ☁️💻
P.S. If you've manually recreated your infrastructure in multiple environments, you know the pain! IaC eliminates this forever. Your future self will thank you! 🙏
P.P.S. I once spent 8 hours debugging why staging didn't match production. Turns out I clicked the wrong checkbox in Console 3 months ago. Never happened again after switching to Terraform. Learn from my pain - USE IaC! 💸