The seal/unseal problem
Every Vault starts sealed. In sealed state, Vault knows where its encrypted data is stored but has no way to decrypt it — the encryption key is split using Shamir's Secret Sharing and distributed as unseal key shards. Traditionally, a human operator must provide M-of-N shards to unseal Vault after every restart.
This creates an operational nightmare: every unplanned restart (OOM kill, hardware failure, cloud spot termination) requires a human with a key shard to be paged at 3am. Auto-unseal delegates the unseal key custody to a cloud KMS — AWS KMS in our case. Vault encrypts its root key with a KMS key, and unseals automatically by calling KMS on startup.
Architecture: three-node Raft cluster
Raft integrated storage (introduced in Vault 1.4) eliminates the need for an external Consul cluster. Vault nodes form a Raft consensus group: one leader handles all writes, followers replicate the log. With three nodes, the cluster tolerates one node failure. With five, two failures.
vault-1 (10.0.1.10) ←→ vault-2 (10.0.1.11) ←→ vault-3 (10.0.1.12)
↑ leader (elected)
└── all writes go here, replicated to followers
AWS KMS setup
Create the KMS key
# Create a dedicated KMS key for Vault auto-unseal
aws kms create-key \
--description "Vault auto-unseal key" \
--key-usage ENCRYPT_DECRYPT \
--key-spec SYMMETRIC_DEFAULT \
--tags TagKey=Service,TagValue=vault \
--output json | jq -r '.KeyMetadata.KeyId'
# Create an alias for readability
aws kms create-alias \
--alias-name alias/vault-unseal \
--target-key-id <key-id>
IAM policy for Vault nodes
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VaultKMSUnseal",
"Effect": "Allow",
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:DescribeKey"
],
"Resource": "arn:aws:kms:eu-north-1:123456789012:key/<key-id>"
}
]
}
Attach this policy to the IAM role assigned to your Vault EC2 instances. Never use access keys — use instance profiles.
Vault configuration
# /etc/vault.d/vault.hcl — same on all three nodes, except api_addr and cluster_addr
ui = true
disable_mlock = false # keep true only on systems where mlock is unavailable
# Listener
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_cert_file = "/opt/vault/tls/vault.crt"
tls_key_file = "/opt/vault/tls/vault.key"
tls_min_version = "tls13"
}
# Raft integrated storage
storage "raft" {
path = "/opt/vault/data"
node_id = "vault-1" # change per node: vault-1, vault-2, vault-3
retry_join {
auto_join = "provider=aws region=eu-north-1 tag_key=vault-cluster tag_value=production"
auto_join_scheme = "https"
leader_tls_servername = "vault.internal"
leader_ca_cert_file = "/opt/vault/tls/ca.crt"
}
}
# AWS KMS auto-unseal
seal "awskms" {
region = "eu-north-1"
kms_key_id = "alias/vault-unseal"
}
api_addr = "https://10.0.1.10:8200" # this node's address
cluster_addr = "https://10.0.1.10:8201"
Initialising the cluster
# Start Vault on all three nodes first, then initialise from any one:
vault operator init \
-recovery-shares=5 \
-recovery-threshold=3
# With auto-unseal, you get RECOVERY keys (not unseal keys)
# Recovery keys are used only to regenerate the root token or re-key
# Store them in a separate secrets manager, not in Vault itself
# Vault will auto-unseal immediately after init via KMS
vault status
With auto-unseal, the "unseal keys" from vault operator init are actually recovery keys. Vault will not prompt for them on restart — KMS handles that. Recovery keys are needed only for recovery operations like root token generation. Treat them with the same care as before, but understand they serve a different purpose.
Verifying HA failover
# Check cluster status vault operator raft list-peers # Node ID Address State Voter # vault-1 10.0.1.10:8201 leader true # vault-2 10.0.1.11:8201 follower true # vault-3 10.0.1.12:8201 follower true # Simulate leader failure systemctl stop vault # on vault-1 # On vault-2 or vault-3, within ~10 seconds: vault operator raft list-peers # vault-2 should now be leader
# Monitor auto-unseal on restart systemctl start vault # back on vault-1 journalctl -u vault -f | grep -E "(unseal|seal|leader)" # You should see: "vault is unsealed" within seconds, no human intervention
Snapshot backup strategy
Raft snapshots are the only backup mechanism for integrated storage. Automate them:
#!/bin/bash
# vault-snapshot.sh — run via cron every 6 hours
set -euo pipefail
SNAPSHOT_DIR="/backup/vault"
DATE=$(date +%Y%m%d-%H%M%S)
SNAPSHOT_FILE="${SNAPSHOT_DIR}/vault-snapshot-${DATE}.snap"
mkdir -p "$SNAPSHOT_DIR"
# Take snapshot (only works against the leader)
vault operator raft snapshot save "$SNAPSHOT_FILE"
# Encrypt before uploading to S3
aws kms encrypt \
--key-id alias/vault-unseal \
--plaintext fileb://"$SNAPSHOT_FILE" \
--query CiphertextBlob \
--output text | base64 -d > "${SNAPSHOT_FILE}.enc"
aws s3 cp "${SNAPSHOT_FILE}.enc" \
s3://your-backup-bucket/vault/$(basename "${SNAPSHOT_FILE}.enc")
# Retain only last 30 days locally
find "$SNAPSHOT_DIR" -name "*.snap" -mtime +30 -delete