Terraform: Stop Clicking Around the AWS Console Before You Delete Production šļø
Terraform: Stop Clicking Around the AWS Console Before You Delete Production šļø
A message I received at 2 AM:
"Hey, I was poking around in the console to see how staging is set up... and I may have accidentally applied a security group change to production."
"Which change?"
"I removed port 443."
"..."
"The site's been down for 40 minutes."
That was the night I became a Terraform evangelist. Not because Terraform is some perfect magical tool ā it's not. But because "I was poking around in the console" is how production incidents are born, nurtured, and eventually given a Wikipedia page.
After 7+ years deploying production applications to AWS, I've learned one truth: if your infrastructure only exists as a series of manual clicks that one person remembers, you're one bad day away from disaster.
Enter Infrastructure as Code.
What Is Terraform and Why Should You Care? š¤
Terraform is a tool that lets you define your cloud infrastructure ā EC2 instances, RDS databases, VPCs, security groups, S3 buckets, load balancers, all of it ā as code in .tf files.
Instead of:
- Opening the AWS console
- Clicking "Launch Instance"
- Choosing AMI from a dropdown
- Picking an instance type
- Configuring security groups by hand
- Hoping you remember what you did next time
You write this:
resource "aws_instance" "api_server" {
ami = "ami-0c02fb55956c7d316"
instance_type = "t3.medium"
vpc_security_group_ids = [aws_security_group.api.id]
subnet_id = aws_subnet.private.id
tags = {
Name = "api-server"
Environment = "production"
Team = "backend"
}
}
And then you run:
terraform plan # Show me what will change
terraform apply # Actually make it happen
That's it. Your infrastructure is now in a Git repo. It's reviewed via pull requests. It's versioned. It's repeatable. It's not living exclusively in Dave's brain.
The Disaster That Converted Me š„
We had a beautiful staging environment. Load balancer, two EC2 instances, RDS, ElastiCache, the works. It took about two weeks of console clicking to set up.
Then we needed to spin up a nearly-identical environment for a client demo.
The conversation:
"How do we recreate staging for the demo?"
"Uh... I think I remember most of what I did."
"You think?"
"The RDS config might be a little different. And I'm not 100% sure which security groups I used for ElastiCache."
"How long will it take?"
"...a day? Maybe two?"
We spent three days. The demo environment had subtle differences from staging. We found bugs that only appeared in demo and not staging. The client was unimpressed.
A CI/CD pipeline that saves our team from this nightmare would have been terraform apply away. Two minutes, not two days.
After that project, I rewrote everything in Terraform. Never looked back.
Your First Terraform Project š
Step 1: Project Structure
infrastructure/
āāā main.tf # Main resources
āāā variables.tf # Input variables
āāā outputs.tf # Output values
āāā providers.tf # AWS provider config
āāā terraform.tfvars # Variable values (don't commit secrets!)
Step 2: Configure the AWS Provider
# providers.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Store state in S3 (not locally!)
backend "s3" {
bucket = "my-terraform-state"
key = "production/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
ManagedBy = "terraform"
Environment = var.environment
Project = var.project_name
}
}
}
That backend "s3" block is critical. Never store Terraform state locally. If your laptop dies, your state is gone. Store it in S3 with DynamoDB locking so two people can't run terraform apply simultaneously and corrupt everything.
Docker taught me about image layers. Terraform taught me about state files. Both will ruin your day if you ignore them.
Step 3: Variables (Stop Hardcoding Everything)
# variables.tf
variable "aws_region" {
description = "AWS region to deploy to"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name (production, staging, demo)"
type = string
}
variable "project_name" {
description = "Project name for tagging"
type = string
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.medium"
}
variable "db_password" {
description = "RDS master password"
type = string
sensitive = true # Won't show in plan output!
}
# terraform.tfvars (for non-secrets)
aws_region = "us-east-1"
environment = "production"
project_name = "my-api"
instance_type = "t3.large"
# db_password comes from env var: TF_VAR_db_password
Now spinning up staging vs. production is just changing one variable. Same Terraform code, different tfvars file. That client demo that took three days? Now it's:
terraform workspace new demo
terraform apply -var-file="demo.tfvars"
Done. Grab a coffee.
The Pattern That Saved Our Production š”ļø
Before Terraform: The Console Chaos
# What our infrastructure "documentation" looked like:
Email from 2021: "I think the security group for the API is sg-abc123?"
Slack message: "hey does anyone know the RDS endpoint? I forgot to save it"
Sticky note on Dave's monitor: "prod db pass: hunter2"
Jenkins job comment: "manually SSH and restart nginx if it breaks"
Infrastructure as vibes. Completely undocumented. One person out sick = entire team blind.
After Terraform: The Pull Request
# PR: "feat: add ElastiCache for session storage"
resource "aws_elasticache_cluster" "sessions" {
cluster_id = "${var.environment}-sessions"
engine = "redis"
node_type = "cache.t3.micro"
num_cache_nodes = 1
parameter_group_name = "default.redis7"
port = 6379
subnet_group_name = aws_elasticache_subnet_group.main.name
security_group_ids = [aws_security_group.redis.id]
tags = {
Name = "${var.environment}-session-cache"
}
}
output "redis_endpoint" {
description = "Redis cluster endpoint"
value = aws_elasticache_cluster.sessions.cache_nodes[0].address
sensitive = false
}
The PR shows exactly what's changing. A teammate reviews it. They ask: "Why t3.micro? Will that handle production load?" You have a conversation before it's deployed. The change is logged in Git history forever.
This is what "infrastructure review" looks like when you're not doing it in the console.
Real-World Multi-Environment Setup āļø
Here's how I structure multi-environment infrastructure for Node.js and Laravel apps on AWS:
infrastructure/
āāā modules/
ā āāā networking/ # VPC, subnets, route tables
ā āāā compute/ # EC2, Auto Scaling Groups
ā āāā database/ # RDS, parameter groups
ā āāā cdn/ # CloudFront, S3 static assets
āāā environments/
ā āāā production/
ā ā āāā main.tf
ā ā āāā terraform.tfvars
ā āāā staging/
ā āāā main.tf
ā āāā terraform.tfvars
āāā shared/
āāā state-backend/ # S3 bucket + DynamoDB for state
The networking module (reuse across environments):
# modules/networking/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-private-${count.index + 1}"
Tier = "private"
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-${count.index + 1}"
Tier = "public"
}
}
Now staging and production use the exact same networking logic, just different CIDR ranges and environment names. Zero manual clicking. Zero "wait, which subnet did I put the database in?"
The terraform plan Habit That Saves Everything š
After countless deployments, I have one non-negotiable rule: always review terraform plan before terraform apply.
terraform plan shows you exactly what will change:
$ terraform plan
Terraform will perform the following actions:
# aws_security_group.api will be updated in-place
~ resource "aws_security_group" "api" {
id = "sg-abc123"
name = "production-api"
+ ingress {
+ cidr_blocks = ["10.0.0.0/8"]
+ from_port = 8080
+ protocol = "tcp"
+ to_port = 8080
}
}
Plan: 0 to add, 1 to change, 0 to destroy.
Read that output. Every time. Before applying. Do not skip this step.
I've caught planned changes that would have taken down entire services ā wrong CIDR range, wrong port, accidentally destroying an RDS instance because I renamed a resource. terraform plan is your last line of defense before consequences.
The rule that's saved me more than once: If plan says "destroy" and you weren't expecting it, do not apply. Investigate.
Common Pitfalls (Learn from My Pain) šŖ¤
Pitfall #1: Storing state locally
# You ran terraform apply on your laptop
# Your laptop gets stolen
# Your state file is gone
# Nobody knows what Terraform thinks exists
# You are now in infrastructure hell
Use an S3 backend. Do it on day one. Not day ten.
Pitfall #2: Not using workspaces or separate state files per environment
# Bad: one state file for everything
terraform apply # Did that just update staging or production?! š±
# Good: explicit environment separation
terraform workspace select production
terraform apply -var-file="production.tfvars"
Pitfall #3: Committing terraform.tfvars with secrets
# .gitignore
*.tfvars # Ignores all var files (add to .gitignore!)
.terraform/ # Local plugin cache
*.tfstate # Never commit state files
*.tfstate.backup
Use environment variables for secrets: TF_VAR_db_password=supersecret terraform apply.
Pitfall #4: Importing existing resources instead of recreating them
If you've been clicking around the console for years and now want to adopt Terraform, you don't have to burn everything down. Import existing resources:
terraform import aws_instance.api_server i-1234567890abcdef0
Now Terraform knows about your existing EC2 instance. It'll manage it going forward without recreating it. This is how I migrated three legacy projects to Terraform without any downtime.
Before vs. After: The Real Numbers š
Spinning up a new environment (VPC + EC2 + RDS + ElastiCache + ALB):
| Method | Time | Accuracy | Documented | Repeatable |
|---|---|---|---|---|
| Manual console clicking | 2-3 days | 70% (always something slightly different) | No | No |
| Terraform | 8 minutes | 100% | Yes (Git history) | Yes |
The 8 minutes is terraform apply running. The setup time to write the Terraform is the investment ā and you only pay it once.
Every subsequent environment: 8 minutes.
Your Action Plan š
- Create an S3 bucket and DynamoDB table for remote state storage
- Pick ONE existing resource (maybe a security group or S3 bucket you already have)
- Import it with
terraform import - Write the
.tffile to describe it - Run
terraform planā if it says "no changes," you've successfully described your existing infrastructure
That's Terraform in five steps. You're not rewriting everything at once ā you're just starting to bring things under control, one resource at a time.
The goal isn't perfection on day one. The goal is: no more 2 AM messages about "accidentally removed port 443."
The Bottom Line š”
Infrastructure as Code is not a nice-to-have. It's the difference between:
- "I'll rebuild the staging environment, give me three days"
- "I'll rebuild the staging environment, give me ten minutes"
It's the difference between "I have no idea why production is configured differently from staging" and a PR that shows exactly what changed, when, and why.
Docker taught me the hard way that your runtime environment matters. Terraform taught me the same lesson about your infrastructure. Both tools exist for the same reason: to make environments reproducible, reviewable, and not dependent on one person's memory.
Start small. Import one resource. Write one module. Add it to your next PR. Your future self ā the one who needs to spin up an environment at 11 PM for an emergency ā will buy you a coffee.
Interested in IaC patterns for AWS? Connect with me on LinkedIn ā I love talking about infrastructure that doesn't wake you up at night.
Check out my projects on GitHub for real-world Terraform configurations from production deployments.
Now go put that console clicking finger away. Your infrastructure deserves better. šļøāØ