System design separates engineers who can build a feature from those who can build a system. These questions probe how someone reasons about scale, tradeoffs and failure.
Hiring a System Design developer is easy. Telling a real one from a convincing résumé is the hard part — and it’s most of what we do. These are grouped by level, because the same question that stretches a junior is a warm-up for a senior.
Junior System Design interview questions
0–2 years
Foundational building blocks.
What is the difference between horizontal and vertical scaling?
Vertical scaling adds power to one machine; horizontal scaling adds more machines behind a load balancer, which scales further but needs statelessness.
Only knows “buy a bigger server.”
What does a load balancer do?
Distributes traffic across servers, enabling scale and redundancy, with health checks routing away from unhealthy instances.
Thinks a load balancer stores state.
What is caching and where do you apply it?
Storing frequently-read data closer to the consumer (browser, CDN, app, database) to cut latency and load; the hard part is invalidation.
Caches everything with no expiry strategy.
What is a database index, at a systems level?
A structure that trades write cost and storage for faster reads, essential once tables grow.
Thinks scaling reads is only about bigger hardware.
What is the difference between SQL and NoSQL?
Relational databases give structure, joins and transactions; NoSQL trades some of those for flexible schemas and horizontal scale. Choose by access pattern.
Believes one is universally better.
What is an API and how do services talk?
A contract for requests/responses, commonly over HTTP/REST or gRPC, decoupling producers and consumers.
No notion of contracts or versioning.
What is latency vs throughput?
Latency is the time for one request; throughput is requests handled per unit time. They can be optimised independently and sometimes traded off.
Uses the terms interchangeably.
What is a stateless service and why does it matter?
One that keeps no client state between requests, so any instance can serve any request — the basis for horizontal scaling.
Stores session state in process memory.
Mid-level System Design interview questions
2–5 years
Data, queues and consistency.
How would you design a URL shortener?
Generate a short unique key, store the mapping, redirect on lookup; discuss key generation, storage choice, caching hot links and read/write ratios.
Jumps to code without discussing scale or key collisions.
When do you introduce a message queue?
To decouple producers from consumers, smooth spikes, and process work asynchronously and reliably with retries.
Does slow work synchronously in the request path.
What is the difference between strong and eventual consistency?
Strong consistency guarantees every read sees the latest write; eventual consistency allows temporary divergence for availability and scale.
Assumes every system must be strongly consistent.
How do you scale reads on a database?
Read replicas, caching, and denormalised read models, accepting replication lag and its consistency implications.
Points all reads at the primary and wonders why it’s slow.
What is idempotency and why does it matter?
An operation that has the same effect if applied multiple times, essential for safe retries in distributed systems (e.g. payment requests).
Retries non-idempotent operations and double-charges.
How do you handle a service that depends on a slow downstream?
Timeouts, retries with backoff, circuit breakers, caching and graceful degradation so one slow dependency doesn’t cascade.
Lets a slow dependency block threads until everything falls over.
What is sharding and what does it cost?
Splitting data across nodes by a key to scale writes and storage; the cost is cross-shard queries, rebalancing and hotspots.
Shards on a key that creates hotspots.
How do you design an API for pagination of large results?
Cursor/keyset pagination for stability and performance over large offsets, rather than OFFSET deep into a dataset.
Uses large OFFSET values that scan and skip millions of rows.
Senior System Design interview questions
5+ years
Tradeoffs, reliability and scale.
Walk me through designing a news feed / timeline.
Discuss fan-out on write vs read, caching, ranking, storage, and the tradeoffs for celebrity accounts and hot partitions.
Gives one design with no tradeoff discussion.
How do you explain the CAP theorem in practice?
Under a network partition you must choose availability or consistency; in normal operation you tune latency vs consistency. They apply it to real choices.
Recites CAP but can’t apply it to a decision.
How do you design for high availability?
Redundancy across zones, no single points of failure, health checks and failover, graceful degradation, and tested recovery.
Assumes a single region and instance is fine.
How do you approach observability?
Metrics, structured logs and distributed tracing tied to SLOs, so you can detect, locate and diagnose problems quickly.
Relies on users to report outages.
How do you handle a hot partition or celebrity problem?
Special-case heavy keys, add caching, split or replicate the hotspot, and consider a different fan-out strategy for them.
Ignores skew and lets one key overwhelm a shard.
How do you make a distributed operation reliable?
Idempotency keys, retries with backoff, outbox/saga patterns for consistency across services, and dead-letter queues.
Assumes network calls always succeed.
How do you decide between monolith and microservices?
By team size, deployment needs and domain boundaries; microservices add operational complexity that only pays off at scale and organisational need.
Reaches for microservices reflexively on day one.
How do you estimate capacity for a new system?
Back-of-the-envelope from expected QPS, data size, read/write ratio and growth, then validate with load testing.
Provisions by guesswork with no numbers.
Build and score a full interview with our free interview scorecard tool, browse the full question hub, or see how we interview engineers.