The previous article Database Replication Explained: Lag, Consistency, and the Bugs You Don’t Expect covered replication lag — the gap between when a write happens and when all replicas see it. But it left a…
In the previous article we established why distributed systems exist — fault tolerance, scalability, and latency eventually force you off a single machine. Once you accept that, a new question immediately follows: if your data…
Every application starts on a single machine. One server, one database, one clean deployment. It works beautifully — fast, simple, easy to reason about. Transactions are straightforward. Queries are predictable. You can SSH into one…
Most backend engineers use JSON for everything. Request bodies, Kafka events, database exports, internal service calls — JSON everywhere. And for most systems, that’s completely fine. But at some point, either through scale or through…
Here’s a question most engineers can’t fully answer, even after years of building systems: why does your company have both PostgreSQL and BigQuery? Not what they’re used for — you probably know that. But why….
Every backend engineer has been there. Someone from the data team sends a Slack message: “Can you run this quick query on the production database? It’ll take two minutes.” You run it. The database slows…
Every developer has heard “add an index” as the fix for a slow query. Most do it. Few understand why it works — or why it sometimes doesn’t. This post is the explanation I wish…
I’ve been reading Designing Data-Intensive Applications by Martin Kleppmann. It’s a brilliant book — and also one of the densest, most diagram-heavy technical books I’ve picked up. There are pages where I’d stare at a diagram for five minutes thinking “what does this…
