Scale From Zero To Millions Of Users

Single Server Setup

A journey of a thousand miles begins with a single step. Everything starts on one server — the web app, database, and cache all running together.

Figure 1 — single server setup

Users access the site via a domain name. DNS resolves it to an IP address, the browser sends an HTTP request, and the server returns HTML or JSON. Traffic comes from two sources: a web app (server-side logic + client HTML/JS) and a mobile app (communicates via HTTP + JSON API).

Database

As the user base grows, we separate the web tier from the database tier so each can scale independently.

Relational vs Non-Relational

Relational databases (MySQL, PostgreSQL) store data in tables and support SQL joins — the default for 40+ years. NoSQL databases (Cassandra, DynamoDB) trade joins for massive scale, low latency, and flexible schemas.

💡When to choose NoSQL

Your app requires super-low latency
Data is unstructured or has no relational shape
You only need to serialize/deserialize (JSON, YAML)
You need to store a massive amount of data

Vertical vs Horizontal Scaling

Vertical scaling (scale up) adds more CPU/RAM to an existing server — simple, but has a hard ceiling and no failover. Horizontal scaling (scale out) adds more servers — more complex, but far more suitable for large-scale systems.

Load Balancer

A load balancer evenly distributes incoming traffic across web servers. Users connect to the load balancer's public IP; servers communicate via private IPs, making them unreachable directly from the internet.

Figure 4 — load balancer with two web servers

✅What this solves

If Server 1 goes offline, all traffic routes to Server 2 — no downtime. When traffic spikes, just add more servers and the load balancer handles the rest automatically.

Database Replication

The master/slave model: a master database handles all writes; one or more slave databases replicate from master and serve reads. Since most apps read far more than they write, this dramatically improves throughput.

Figure 5 — master DB with multiple slave DBs

Benefits: better performance (parallel reads), reliability (data is replicated across locations), and high availability (if one DB goes offline, others serve traffic).

Cache & CDN

Cache

A cache stores expensive responses in memory so subsequent requests are served faster. The read-through pattern: check cache first — if hit, serve it; if miss, query the DB, store in cache, then serve.

⚠️Cache considerations

Use cache for data read often but modified rarely
Set sensible expiry — too short = DB overload, too long = stale data
Run multiple cache servers to avoid a single point of failure
LRU is the most common eviction policy

Content Delivery Network (CDN)

A CDN is a network of geographically dispersed servers that cache and deliver static content (JS, CSS, images, video). When a user requests content, the edge server closest to them responds — dramatically reducing latency.

Figure 7 — cache + CDN architecture

Stateless Web Tier

To scale horizontally, session state must leave the web servers. A stateful server remembers client data between requests, forcing sticky sessions. A stateless server reads session data from a shared store (Redis, NoSQL) that any server can access.

Figure 13 — stateless architecture

With stateless servers, auto-scaling becomes trivial: spin servers up or down based on traffic load without worrying about session affinity.

Data Centers

To serve a global audience with low latency, run multiple geographically distributed data centers. GeoDNS routes users to the nearest data center. In a failure, 100% of traffic redirects to the healthy data center automatically.

🌐Key challenges

Traffic redirection — GeoDNS routes to nearest healthy center
Data synchronization — async multi-region replication
Consistent deployment — automated pipelines across all centers

Message Queue

A message queue enables asynchronous communication between services. Producers publish messages to the queue; consumers pull and process them independently. This decoupling makes the system resilient and independently scalable.

Figure 17 — producer → queue → consumer

Example: a photo-editing app publishes processing jobs to the queue. Worker nodes pick them up asynchronously. When the queue grows long, scale up workers; when it empties, scale them down.

Database Scaling

Vertical Scaling

Add more CPU/RAM/disk to the existing database. AWS RDS supports up to 24 TB RAM. But hardware limits exist, it creates a single point of failure, and powerful servers are expensive.

Horizontal Scaling — Sharding

Sharding splits the database into smaller shards. Each shard holds the same schema but a unique subset of data. A hash function (e.g. user_id % 4) routes queries to the correct shard.

Figure 21 — user data sharded across 4 nodes

⚠️Sharding challenges

Resharding — needed when a shard is exhausted; consistent hashing helps
Hotspot problem — a single shard may receive disproportionate traffic
Join complexity — cross-shard joins require denormalization

Summary

Scaling is an iterative process. Here is the full playbook for scaling from a single server to millions of users:

→
Keep web tier stateless
Any server handles any request
→
Build redundancy at every tier
No single point of failure
→
Cache data aggressively
Reduce DB load and latency
→
Support multiple data centers
Geo-routing and failover
→
Host static assets in CDN
Serve from the edge, not origin
→
Scale data tier by sharding
Distribute load across nodes
→
Use message queues
Decouple services asynchronously
→
Monitor and automate
Observability and CI/CD are non-negotiable

Based on concepts from System Design Interview by Alex Xu. Part of my ongoing series on backend architecture and distributed systems.

Views —

If this post helped, tap the heart after reading.

← Back to all posts

Scale From Zero To Millions of Users