0x55aa
Back to Blog

🔐 Field-Level Encryption: Your Database Is Not a Safe

|
6 min read

Let me describe a scene that plays out more often than the industry likes to admit.

A company gets breached. The post-mortem comes out and it says, reassuringly: "All data was encrypted at rest." The press release calls it a "sophisticated attack." Security Twitter dunks on the phrasing because what it actually means is: the attacker got database credentials, ran SELECT * FROM users, and read every SSN, passport number, and credit card in plaintext — because disk encryption only helps when someone physically walks out with the hard drive.

Encryption at rest protects the drive. Encryption in transit protects the wire. Neither protects the data when a query runs against it.

Field-level encryption is the thing that closes that gap. And most teams aren't doing it.


What Field-Level Encryption Actually Is

The idea is simple: instead of trusting the database to guard your sensitive fields, you encrypt them before they ever touch the database. The ciphertext lands in the column. A stolen backup is garbage without the key. A compromised read replica is garbage. A SQL injection that dumps the table? Garbage.

The database sees something like this:

AQEDAHjD2o6fL3+ABCDEF1234...==

Your application sees:

123-45-6789

The transformation happens in your application layer, and the key lives somewhere else entirely — a KMS, a Vault cluster, an HSM — not in the database connection string.


The Naive Implementation (and Why It Fails)

The most common first attempt at field-level encryption looks like this:

from cryptography.fernet import Fernet

# DON'T do this
KEY = b'hardcoded-key-in-source-code-oops=='
cipher = Fernet(KEY)

def store_ssn(ssn: str, user_id: int):
    encrypted = cipher.encrypt(ssn.encode())
    db.execute(
        "INSERT INTO users (id, ssn) VALUES (?, ?)",
        (user_id, encrypted)
    )

The encryption itself is fine. The key management is not. You've traded "plaintext SSN in the database" for "plaintext key in your source code." Anyone with repo access now has everything. The GitHub secret scanner will find that key and flag it, but by then it's already in your git history, your CI logs, and probably your colleague's local clone.

The fix is envelope encryption with a proper KMS.


Envelope Encryption: The Pattern That Actually Works

Here's the mental model: you don't encrypt data directly with the master key. You generate a fresh data encryption key (DEK) per record (or per tenant), encrypt the data with the DEK, then encrypt the DEK with a key encryption key (KEK) that lives in your KMS. You store the encrypted DEK alongside the encrypted data.

import boto3
import base64
from cryptography.fernet import Fernet

kms = boto3.client('kms', region_name='ap-south-1')
KMS_KEY_ID = 'arn:aws:kms:ap-south-1:123456789:key/your-key-id'

def encrypt_field(plaintext: str) -> dict:
    # Generate a fresh DEK from KMS
    response = kms.generate_data_key(
        KeyId=KMS_KEY_ID,
        KeySpec='AES_256'
    )
    plaintext_dek = response['Plaintext']      # use this, then discard
    encrypted_dek = response['CiphertextBlob'] # store this

    # Encrypt the actual data with the DEK
    cipher = Fernet(base64.urlsafe_b64encode(plaintext_dek[:32]))
    ciphertext = cipher.encrypt(plaintext.encode())

    return {
        'ciphertext': base64.b64encode(ciphertext).decode(),
        'encrypted_dek': base64.b64encode(encrypted_dek).decode(),
    }

def decrypt_field(payload: dict) -> str:
    encrypted_dek = base64.b64decode(payload['encrypted_dek'])
    ciphertext = base64.b64decode(payload['ciphertext'])

    # KMS decrypts the DEK
    response = kms.decrypt(CiphertextBlob=encrypted_dek)
    plaintext_dek = response['Plaintext']

    cipher = Fernet(base64.urlsafe_b64encode(plaintext_dek[:32]))
    return cipher.decrypt(ciphertext).decode()

Now the master key never leaves KMS. An attacker who steals your database gets encrypted blobs and encrypted DEKs — useless without the ability to call KMS, which requires valid AWS credentials with the right key policy. You've separated the data from the key material.

This is the pattern we use at Cubet for any personal data fields on GDPR-scoped client projects. The KMS call adds a few milliseconds; the regulatory peace of mind is worth considerably more.


The Tradeoffs You Need to Own

Field-level encryption isn't free lunch. Here's what you're giving up:

Searchability. You can't do WHERE ssn = ? on encrypted data. You need to either: store a deterministic HMAC of the value for lookups (a "blind index"), route queries through a service that decrypts in memory, or restructure your access patterns.

The blind index pattern:

import hmac, hashlib

def blind_index(value: str, secret: bytes) -> str:
    return hmac.new(secret, value.encode(), hashlib.sha256).hexdigest()

# Store alongside the ciphertext — searchable but not reversible
search_token = blind_index(ssn, LOOKUP_SECRET)

Performance. Every read that needs the decrypted value costs a KMS round-trip unless you cache DEKs carefully. Caching in memory helps; caching in Redis with a TTL is the usual compromise.

Key rotation is now your problem. If you ever need to rotate keys, you need to re-encrypt every row. Plan for this. Build the tooling before you need it under pressure. A rotation job that runs offline, reads pages of encrypted rows, decrypts with the old key, re-encrypts with the new key, and writes back is not glamorous work — but skipping it means your "encryption" has an expiry date you haven't thought about.


When to Actually Use This

Not everything needs field-level encryption. An overkill schema with every column encrypted is performance misery and operational debt. Apply it to:

  • Government IDs (SSN, passport, NIC numbers)
  • Full payment card data you're not offloading to a PCI-compliant processor
  • Health and medical information (HIPAA territory)
  • Biometric identifiers
  • Private keys or credentials your users store in your system

Everything else — names, email addresses, order histories — should be protected by access controls, not field-level encryption. Encrypt the right things; don't encrypt everything.


The Bottom Line

"Encrypted at rest" means your storage vendor can't read the disk. It does not mean your data is safe. The threat model for a real breach involves running queries — and that's exactly what field-level encryption defends against.

The pattern isn't complicated: generate a DEK per sensitive field, encrypt the field, encrypt the DEK with a KMS-managed key, store both. Decrypt only when you need the value. Build blind indexes for anything you need to search.

Your database is a query engine, not a safe. Stop treating it like one.


If you're implementing field-level encryption and hit edge cases around key rotation or performance tuning, hit me up — LinkedIn or GitHub. I've dug through most of the sharp edges on this one.

Thanks for reading!

Back to all posts