Vertical vs Horizontal Scaling: Why You Can’t Just Buy a Bigger Server

abhijeetpnwr

1 week ago

Every application starts on a single machine. One server, one database, one clean deployment. It works beautifully — fast, simple, easy to reason about. Transactions are straightforward. Queries are predictable. You can SSH into one box and see everything that’s happening.

At some point, that machine starts struggling. Queries slow down. Disk fills up. CPU spikes during peak traffic. The natural instinct is to reach for a bigger machine — more RAM, faster CPU, larger SSD. And for a while, that works.

But eventually it doesn’t. Understanding why, and what you do instead, is one of the most important decisions in backend engineering.

The Case for Staying on One Machine

Before talking about distributed systems, it’s worth being honest about something the industry rarely admits: a single machine is genuinely excellent, and you should stay on one as long as possible.

A single PostgreSQL instance handles tens of thousands of queries per second. A well-tuned server with NVMe storage and 256GB of RAM can carry most applications through millions of users. Transactions are simple — begin, commit, rollback. You never worry about network partitions, replication lag, or distributed consensus. Debugging is straightforward. Operational complexity is low.

The mistake most engineering teams make is distributing too early — adding complexity in anticipation of scale that never arrives. The right question is not “should we distribute?” but “have we genuinely hit the limits of one machine?”

Three Reasons You Eventually Have To

When a single machine genuinely isn’t enough, the pressure usually comes from one of three directions.

Scalability is the obvious one. Your dataset grows beyond what one disk can hold. Query throughput exceeds what one CPU can serve. You’re processing more writes per second than one machine can absorb. The data or the load has simply outgrown the hardware.

Fault tolerance is the one most teams hit first — before they ever hit a scale problem. A single machine is a single point of failure. Hardware fails: disks corrupt, power supplies die, network cards malfunction. If your entire application runs on one server and that server goes down, everything goes down with it.

Five nines of availability — 99.999% uptime — means less than six minutes of downtime per year. One machine, no matter how reliable, cannot give you that. Hardware maintenance alone requires planned downtime. Unplanned failures are a matter of when, not if.

Latency is the third pressure, and it’s governed by physics rather than engineering. The speed of light imposes a hard floor on how fast data can travel between two points. A server in Mumbai serving users in São Paulo adds roughly 150-200ms of unavoidable round-trip latency. If your users are spread globally, your data needs to be too.

Why Not Just Buy a Bigger Machine?

When a single machine starts struggling, the instinct is to scale it up — more cores, more RAM, faster storage. This is called vertical scaling, and it has real advantages: no architectural changes, no distributed systems complexity, no new failure modes to reason about. You upgrade the hardware and the application keeps running.

The problem is that vertical scaling has two fundamental limits.

Cost scales superlinearly. Doubling a server’s performance doesn’t cost twice as much — it costs four to ten times as much. Commodity hardware at moderate specs is cheap. Enterprise hardware at the top end is extraordinarily expensive. At some point you’re paying a premium for diminishing returns, and the next tier of hardware doesn’t exist.

You’re still one machine. A bigger server is a more expensive single point of failure. More RAM doesn’t make the disk immortal. A faster CPU doesn’t prevent a datacenter power outage. If fault tolerance is the reason you need to scale, vertical scaling doesn’t address the problem at all — it just makes the failure more expensive.

There’s also a hard ceiling. There is a biggest machine money can buy, and real applications have hit it.

The Alternatives: Shared Disk and Shared Nothing

Before reaching for a fully distributed architecture, it’s worth understanding two intermediate approaches that try to split the difference.

Shared disk architecture puts multiple machines in front of a single shared storage layer — a SAN or NAS that all nodes can read and write. The compute scales horizontally while the data remains centralised. This sounds appealing but runs into a fundamental problem: the shared disk becomes the bottleneck. When multiple machines try to write to the same storage simultaneously, you need locking and coordination across the network, which is slow and complex. Shared disk works in some specialised contexts — Oracle RAC uses it — but it’s largely a historical dead end for general-purpose databases.

Shared nothing is the architecture that modern distributed databases actually use. Each node is completely independent — its own CPU, its own RAM, its own disk. Nodes communicate only by passing messages over a network. There is no shared bottleneck. Add a node and you add capacity proportionally. Use commodity hardware — cheap, replaceable, available everywhere.

The tradeoff is real: complexity moves from hardware into software. Your application now has to deal with things that simply don’t exist on a single machine — partial failures (where some nodes succeed and others don’t), network delays, data that might be slightly out of date depending on which node you read from. These are genuinely hard problems. But they’re solvable problems, and the distributed systems community has been solving them for decades.

The Two Tools Distributed Systems Use

Once you commit to shared nothing, you have two fundamental techniques for distributing data.

Replication keeps copies of the same data on multiple nodes. Every node in a replication group holds the same data — or a recent version of it. This solves fault tolerance directly: if one node fails, the others continue serving requests. It also helps with read throughput, since reads can be spread across multiple replicas. And it addresses latency: you can place replicas geographically close to your users.

Partitioning (also called sharding) splits the dataset so each node owns a different subset. Node 1 holds users A-M, Node 2 holds users N-Z. Or more precisely, keys are hashed to determine which node is responsible for them. This solves the data volume problem — your dataset can grow beyond what any single node can hold — and it distributes write throughput across nodes.

Most real systems use both simultaneously. The data is partitioned across nodes, and each partition is replicated across multiple nodes for fault tolerance. A production Cassandra or Kafka cluster is doing both of these things at once.

What You’re Actually Signing Up For

Distributed systems solve real problems. But it’s worth being clear-eyed about what you’re taking on.

A single machine fails in simple, understandable ways. A distributed system fails in subtle, compound ways. A network can be slow without being down. A node can appear healthy to some nodes and dead to others. A write can succeed on two nodes and fail on a third. These are called partial failures, and they don’t exist on a single machine.

Debugging a single-machine application means looking at one set of logs. Debugging a distributed system means correlating logs across dozens of nodes, reasoning about race conditions across network boundaries, and understanding failure modes that only manifest under specific combinations of timing and load.

This isn’t an argument against distributed systems — at sufficient scale they’re unavoidable. It’s an argument for not distributing before you have to. The question is never “distributed systems are modern, should we use them?” The question is “have we genuinely exhausted what one machine can do, and are we ready to pay the complexity tax?”

Most teams reach for distribution for fault tolerance long before they reach a scale problem. A two-node setup with one primary and one replica — where the replica takes over if the primary fails — solves most fault tolerance requirements without the full complexity of a distributed database. That’s a good place to start.

When the data genuinely outgrows what those two nodes can hold, or when the write throughput genuinely exceeds what one primary can absorb, then partitioning enters the picture. By that point you’ll have enough operational experience with replication to tackle the additional complexity.

The Honest Summary

Vertical scaling — bigger machines — works until it doesn’t. The cost curve turns against you, and you’re still one failure away from total outage.

Shared nothing horizontal scaling solves both problems: commodity hardware scales linearly, and multiple nodes mean no single point of failure. The price is complexity that moves from hardware into your software and your operations.

Replication and partitioning are the two tools that make horizontal scaling work. Replication handles fault tolerance and read throughput (Well, we do cover it in out next article: Database Replication Explained: Lag, Consistency, and the Bugs You Don’t Expect in Details) . Partitioning handles data volume and write throughput. Real systems use both.

The best distributed system is the one you don’t build until you have to. Start on one machine, scale it up as far as it makes economic sense, add a replica for fault tolerance, and only partition when the data genuinely demands it. The complexity you avoid is the cheapest kind.