Single Server Setup
A journey of a thousand miles begins with a single step. Everything starts on one server — the web app, database, and cache all running together.
Users access the site via a domain name. DNS resolves it to an IP address, the browser sends an HTTP request, and the server returns HTML or JSON. Traffic comes from two sources: a web app (server-side logic + client HTML/JS) and a mobile app (communicates via HTTP + JSON API).
Database
As the user base grows, we separate the web tier from the database tier so each can scale independently.
Relational vs Non-Relational
Relational databases (MySQL, PostgreSQL) store data in tables and support SQL joins — the default for 40+ years. NoSQL databases (Cassandra, DynamoDB) trade joins for massive scale, low latency, and flexible schemas.
💡When to choose NoSQL
- Your app requires super-low latency
- Data is unstructured or has no relational shape
- You only need to serialize/deserialize (JSON, YAML)
- You need to store a massive amount of data
Vertical vs Horizontal Scaling
Vertical scaling (scale up) adds more CPU/RAM to an existing server — simple, but has a hard ceiling and no failover. Horizontal scaling (scale out) adds more servers — more complex, but far more suitable for large-scale systems.
Load Balancer
A load balancer evenly distributes incoming traffic across web servers. Users connect to the load balancer's public IP; servers communicate via private IPs, making them unreachable directly from the internet.
✅What this solves
Database Replication
The master/slave model: a master database handles all writes; one or more slave databases replicate from master and serve reads. Since most apps read far more than they write, this dramatically improves throughput.
Benefits: better performance (parallel reads), reliability (data is replicated across locations), and high availability (if one DB goes offline, others serve traffic).
Cache & CDN
Cache
A cache stores expensive responses in memory so subsequent requests are served faster. The read-through pattern: check cache first — if hit, serve it; if miss, query the DB, store in cache, then serve.
⚠️Cache considerations
- Use cache for data read often but modified rarely
- Set sensible expiry — too short = DB overload, too long = stale data
- Run multiple cache servers to avoid a single point of failure
- LRU is the most common eviction policy
Content Delivery Network (CDN)
A CDN is a network of geographically dispersed servers that cache and deliver static content (JS, CSS, images, video). When a user requests content, the edge server closest to them responds — dramatically reducing latency.
Stateless Web Tier
To scale horizontally, session state must leave the web servers. A stateful server remembers client data between requests, forcing sticky sessions. A stateless server reads session data from a shared store (Redis, NoSQL) that any server can access.
With stateless servers, auto-scaling becomes trivial: spin servers up or down based on traffic load without worrying about session affinity.
Data Centers
To serve a global audience with low latency, run multiple geographically distributed data centers. GeoDNS routes users to the nearest data center. In a failure, 100% of traffic redirects to the healthy data center automatically.
🌐Key challenges
- Traffic redirection — GeoDNS routes to nearest healthy center
- Data synchronization — async multi-region replication
- Consistent deployment — automated pipelines across all centers
Message Queue
A message queue enables asynchronous communication between services. Producers publish messages to the queue; consumers pull and process them independently. This decoupling makes the system resilient and independently scalable.
Example: a photo-editing app publishes processing jobs to the queue. Worker nodes pick them up asynchronously. When the queue grows long, scale up workers; when it empties, scale them down.
Database Scaling
Vertical Scaling
Add more CPU/RAM/disk to the existing database. AWS RDS supports up to 24 TB RAM. But hardware limits exist, it creates a single point of failure, and powerful servers are expensive.
Horizontal Scaling — Sharding
Sharding splits the database into smaller shards. Each shard holds the same schema but a unique subset of data. A hash function (e.g. user_id % 4) routes queries to the correct shard.
⚠️Sharding challenges
- Resharding — needed when a shard is exhausted; consistent hashing helps
- Hotspot problem — a single shard may receive disproportionate traffic
- Join complexity — cross-shard joins require denormalization
Summary
Scaling is an iterative process. Here is the full playbook for scaling from a single server to millions of users:
- →
Keep web tier stateless
Any server handles any request
- →
Build redundancy at every tier
No single point of failure
- →
Cache data aggressively
Reduce DB load and latency
- →
Support multiple data centers
Geo-routing and failover
- →
Host static assets in CDN
Serve from the edge, not origin
- →
Scale data tier by sharding
Distribute load across nodes
- →
Use message queues
Decouple services asynchronously
- →
Monitor and automate
Observability and CI/CD are non-negotiable
Based on concepts from System Design Interview by Alex Xu. Part of my ongoing series on backend architecture and distributed systems.
If this post helped, tap the heart after reading.