0x55aa
← Back to Blog

Terraform: Stop Clicking Around the AWS Console Before You Delete Production šŸ—ļø

•10 min read

Terraform: Stop Clicking Around the AWS Console Before You Delete Production šŸ—ļø

A message I received at 2 AM:

"Hey, I was poking around in the console to see how staging is set up... and I may have accidentally applied a security group change to production."

"Which change?"

"I removed port 443."

"..."

"The site's been down for 40 minutes."

That was the night I became a Terraform evangelist. Not because Terraform is some perfect magical tool — it's not. But because "I was poking around in the console" is how production incidents are born, nurtured, and eventually given a Wikipedia page.

After 7+ years deploying production applications to AWS, I've learned one truth: if your infrastructure only exists as a series of manual clicks that one person remembers, you're one bad day away from disaster.

Enter Infrastructure as Code.

What Is Terraform and Why Should You Care? šŸ¤”

Terraform is a tool that lets you define your cloud infrastructure — EC2 instances, RDS databases, VPCs, security groups, S3 buckets, load balancers, all of it — as code in .tf files.

Instead of:

  • Opening the AWS console
  • Clicking "Launch Instance"
  • Choosing AMI from a dropdown
  • Picking an instance type
  • Configuring security groups by hand
  • Hoping you remember what you did next time

You write this:

resource "aws_instance" "api_server" {
  ami           = "ami-0c02fb55956c7d316"
  instance_type = "t3.medium"

  vpc_security_group_ids = [aws_security_group.api.id]
  subnet_id              = aws_subnet.private.id

  tags = {
    Name        = "api-server"
    Environment = "production"
    Team        = "backend"
  }
}

And then you run:

terraform plan   # Show me what will change
terraform apply  # Actually make it happen

That's it. Your infrastructure is now in a Git repo. It's reviewed via pull requests. It's versioned. It's repeatable. It's not living exclusively in Dave's brain.

The Disaster That Converted Me šŸ”„

We had a beautiful staging environment. Load balancer, two EC2 instances, RDS, ElastiCache, the works. It took about two weeks of console clicking to set up.

Then we needed to spin up a nearly-identical environment for a client demo.

The conversation:

"How do we recreate staging for the demo?"

"Uh... I think I remember most of what I did."

"You think?"

"The RDS config might be a little different. And I'm not 100% sure which security groups I used for ElastiCache."

"How long will it take?"

"...a day? Maybe two?"

We spent three days. The demo environment had subtle differences from staging. We found bugs that only appeared in demo and not staging. The client was unimpressed.

A CI/CD pipeline that saves our team from this nightmare would have been terraform apply away. Two minutes, not two days.

After that project, I rewrote everything in Terraform. Never looked back.

Your First Terraform Project šŸŽ“

Step 1: Project Structure

infrastructure/
ā”œā”€ā”€ main.tf           # Main resources
ā”œā”€ā”€ variables.tf      # Input variables
ā”œā”€ā”€ outputs.tf        # Output values
ā”œā”€ā”€ providers.tf      # AWS provider config
└── terraform.tfvars  # Variable values (don't commit secrets!)

Step 2: Configure the AWS Provider

# providers.tf
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  # Store state in S3 (not locally!)
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      ManagedBy   = "terraform"
      Environment = var.environment
      Project     = var.project_name
    }
  }
}

That backend "s3" block is critical. Never store Terraform state locally. If your laptop dies, your state is gone. Store it in S3 with DynamoDB locking so two people can't run terraform apply simultaneously and corrupt everything.

Docker taught me about image layers. Terraform taught me about state files. Both will ruin your day if you ignore them.

Step 3: Variables (Stop Hardcoding Everything)

# variables.tf
variable "aws_region" {
  description = "AWS region to deploy to"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name (production, staging, demo)"
  type        = string
}

variable "project_name" {
  description = "Project name for tagging"
  type        = string
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t3.medium"
}

variable "db_password" {
  description = "RDS master password"
  type        = string
  sensitive   = true  # Won't show in plan output!
}
# terraform.tfvars (for non-secrets)
aws_region   = "us-east-1"
environment  = "production"
project_name = "my-api"
instance_type = "t3.large"

# db_password comes from env var: TF_VAR_db_password

Now spinning up staging vs. production is just changing one variable. Same Terraform code, different tfvars file. That client demo that took three days? Now it's:

terraform workspace new demo
terraform apply -var-file="demo.tfvars"

Done. Grab a coffee.

The Pattern That Saved Our Production šŸ›”ļø

Before Terraform: The Console Chaos

# What our infrastructure "documentation" looked like:
Email from 2021: "I think the security group for the API is sg-abc123?"
Slack message: "hey does anyone know the RDS endpoint? I forgot to save it"
Sticky note on Dave's monitor: "prod db pass: hunter2"
Jenkins job comment: "manually SSH and restart nginx if it breaks"

Infrastructure as vibes. Completely undocumented. One person out sick = entire team blind.

After Terraform: The Pull Request

# PR: "feat: add ElastiCache for session storage"
resource "aws_elasticache_cluster" "sessions" {
  cluster_id           = "${var.environment}-sessions"
  engine               = "redis"
  node_type            = "cache.t3.micro"
  num_cache_nodes      = 1
  parameter_group_name = "default.redis7"
  port                 = 6379

  subnet_group_name  = aws_elasticache_subnet_group.main.name
  security_group_ids = [aws_security_group.redis.id]

  tags = {
    Name = "${var.environment}-session-cache"
  }
}

output "redis_endpoint" {
  description = "Redis cluster endpoint"
  value       = aws_elasticache_cluster.sessions.cache_nodes[0].address
  sensitive   = false
}

The PR shows exactly what's changing. A teammate reviews it. They ask: "Why t3.micro? Will that handle production load?" You have a conversation before it's deployed. The change is logged in Git history forever.

This is what "infrastructure review" looks like when you're not doing it in the console.

Real-World Multi-Environment Setup āš™ļø

Here's how I structure multi-environment infrastructure for Node.js and Laravel apps on AWS:

infrastructure/
ā”œā”€ā”€ modules/
│   ā”œā”€ā”€ networking/     # VPC, subnets, route tables
│   ā”œā”€ā”€ compute/        # EC2, Auto Scaling Groups
│   ā”œā”€ā”€ database/       # RDS, parameter groups
│   └── cdn/            # CloudFront, S3 static assets
ā”œā”€ā”€ environments/
│   ā”œā”€ā”€ production/
│   │   ā”œā”€ā”€ main.tf
│   │   └── terraform.tfvars
│   └── staging/
│       ā”œā”€ā”€ main.tf
│       └── terraform.tfvars
└── shared/
    └── state-backend/  # S3 bucket + DynamoDB for state

The networking module (reuse across environments):

# modules/networking/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.environment}-vpc"
  }
}

resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.environment}-private-${count.index + 1}"
    Tier = "private"
  }
}

resource "aws_subnet" "public" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.environment}-public-${count.index + 1}"
    Tier = "public"
  }
}

Now staging and production use the exact same networking logic, just different CIDR ranges and environment names. Zero manual clicking. Zero "wait, which subnet did I put the database in?"

The terraform plan Habit That Saves Everything šŸ‘€

After countless deployments, I have one non-negotiable rule: always review terraform plan before terraform apply.

terraform plan shows you exactly what will change:

$ terraform plan

Terraform will perform the following actions:

  # aws_security_group.api will be updated in-place
  ~ resource "aws_security_group" "api" {
        id   = "sg-abc123"
        name = "production-api"

      + ingress {
          + cidr_blocks = ["10.0.0.0/8"]
          + from_port   = 8080
          + protocol    = "tcp"
          + to_port     = 8080
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Read that output. Every time. Before applying. Do not skip this step.

I've caught planned changes that would have taken down entire services — wrong CIDR range, wrong port, accidentally destroying an RDS instance because I renamed a resource. terraform plan is your last line of defense before consequences.

The rule that's saved me more than once: If plan says "destroy" and you weren't expecting it, do not apply. Investigate.

Common Pitfalls (Learn from My Pain) 🪤

Pitfall #1: Storing state locally

# You ran terraform apply on your laptop
# Your laptop gets stolen
# Your state file is gone
# Nobody knows what Terraform thinks exists
# You are now in infrastructure hell

Use an S3 backend. Do it on day one. Not day ten.

Pitfall #2: Not using workspaces or separate state files per environment

# Bad: one state file for everything
terraform apply  # Did that just update staging or production?! 😱

# Good: explicit environment separation
terraform workspace select production
terraform apply -var-file="production.tfvars"

Pitfall #3: Committing terraform.tfvars with secrets

# .gitignore
*.tfvars        # Ignores all var files (add to .gitignore!)
.terraform/     # Local plugin cache
*.tfstate       # Never commit state files
*.tfstate.backup

Use environment variables for secrets: TF_VAR_db_password=supersecret terraform apply.

Pitfall #4: Importing existing resources instead of recreating them

If you've been clicking around the console for years and now want to adopt Terraform, you don't have to burn everything down. Import existing resources:

terraform import aws_instance.api_server i-1234567890abcdef0

Now Terraform knows about your existing EC2 instance. It'll manage it going forward without recreating it. This is how I migrated three legacy projects to Terraform without any downtime.

Before vs. After: The Real Numbers šŸ“Š

Spinning up a new environment (VPC + EC2 + RDS + ElastiCache + ALB):

Method Time Accuracy Documented Repeatable
Manual console clicking 2-3 days 70% (always something slightly different) No No
Terraform 8 minutes 100% Yes (Git history) Yes

The 8 minutes is terraform apply running. The setup time to write the Terraform is the investment — and you only pay it once.

Every subsequent environment: 8 minutes.

Your Action Plan šŸš€

  1. Create an S3 bucket and DynamoDB table for remote state storage
  2. Pick ONE existing resource (maybe a security group or S3 bucket you already have)
  3. Import it with terraform import
  4. Write the .tf file to describe it
  5. Run terraform plan — if it says "no changes," you've successfully described your existing infrastructure

That's Terraform in five steps. You're not rewriting everything at once — you're just starting to bring things under control, one resource at a time.

The goal isn't perfection on day one. The goal is: no more 2 AM messages about "accidentally removed port 443."

The Bottom Line šŸ’”

Infrastructure as Code is not a nice-to-have. It's the difference between:

  • "I'll rebuild the staging environment, give me three days"
  • "I'll rebuild the staging environment, give me ten minutes"

It's the difference between "I have no idea why production is configured differently from staging" and a PR that shows exactly what changed, when, and why.

Docker taught me the hard way that your runtime environment matters. Terraform taught me the same lesson about your infrastructure. Both tools exist for the same reason: to make environments reproducible, reviewable, and not dependent on one person's memory.

Start small. Import one resource. Write one module. Add it to your next PR. Your future self — the one who needs to spin up an environment at 11 PM for an emergency — will buy you a coffee.


Interested in IaC patterns for AWS? Connect with me on LinkedIn — I love talking about infrastructure that doesn't wake you up at night.

Check out my projects on GitHub for real-world Terraform configurations from production deployments.

Now go put that console clicking finger away. Your infrastructure deserves better. šŸ—ļøāœØ