Active-Passive Multi-Region Failover for a Read-Heavy API
Stand up a warm standby in a second region for the read-heavy profile API so DNS fails over automatically when the primary's health check goes red, without paying for a full duplicate stack while the standby is idle.
#networking#resilience#data#dns
LabTindrloop, a 40-person "dating for developers" startup whose single us-east-1 deployment went dark for 90 minutes during the last regional event — long enough to trend on Hacker News for the wrong reasons.all labs
02 - Actions
score -- - -- votes
-- completed
Sign in with a real account to save ratings, completions, and share snapshots.
Stateloading
03 - Scenario
Tindrloop, a 40-person "dating for developers" startup whose single us-east-1 deployment went dark for 90 minutes during the last regional event — long enough to trend on Hacker News for the wrong reasons.
Stand up a warm standby in a second region for the read-heavy profile API so DNS fails over automatically when the primary's health check goes red, without paying for a full duplicate stack while the standby is idle.
Constraints
RTO < 5 minutes, fully automated — no human in the failover path
Idle standby cost < $50/month (no always-on compute duplication)
Everything as Terraform — no console clicks, reviewable in a PR
Data layer must be multi-region with single-digit-ms reads in both
Declare two AWS provider blocks — a default primary (us-east-1) and an aliased secondary (us-west-2). Use a data source to confirm each provider resolves to the region you expect, and adopt default tags so every resource is labelled with the lab name and which region it belongs to.
Create a DynamoDB table for profile reads and give it a replica in the secondary region so it becomes a global table. Enable streams and point-in-time recovery. Note in your README why a global table satisfies the single-digit-ms read constraint where cross-region reads would not.
Hint: Global tables require the table to use PAY_PER_REQUEST or have streams enabled with NEW_AND_OLD_IMAGES — Terraform will error if streams are off.
Define a Route 53 health check that polls your primary region's API health endpoint. Tune the failure threshold and request interval so a genuine outage trips it inside your 5-minute RTO without flapping on a single slow response.
Create two Route 53 records for the same name — a PRIMARY failover record tied to the health check and a SECONDARY record pointing at the standby. When the health check goes unhealthy, Route 53 stops answering with the primary and serves the secondary automatically.
Force the primary health check unhealthy (block the health path or point it at a dead endpoint), then resolve the DNS name repeatedly and watch the answer flip to the secondary. Capture the wall-clock time from "primary down" to "DNS serving secondary" and compare it against your 5-minute RTO.