What level of detail is expected in a system design interview?

You should demonstrate breadth in the high-level design (showing you understand how all the pieces fit together) and depth in 1-2 specific areas the interviewer cares about. You do not need to design every component in detail — the interviewer will guide you toward what they want to explore. Common deep-dive areas include database schema design, caching strategy, data partitioning, consistency models, and failure handling. Being able to articulate trade-offs is more important than having a perfect design.

System Design Interview Questions 2026: 20 Common Problems with Approaches & Frameworks

Q: What framework should I use for system design interviews?

Use a 5-step framework: (1) Requirements clarification — ask questions to scope the problem, distinguish functional vs. non-functional requirements. (2) Back-of-envelope estimation — calculate QPS, storage, bandwidth to understand scale. (3) High-level design — draw the core components and data flow. (4) Detailed design — dive deep into 1-2 components that matter most. (5) Trade-offs and bottlenecks — discuss what could fail, how to scale, and alternative approaches. Spend about 5 minutes on requirements, 3 minutes on estimation, 15 minutes on high-level design, 15 minutes on deep dives, and 5 minutes on trade-offs.

Q: How long should a system design interview last?

Most system design interviews are 45-60 minutes. The first 5-10 minutes should be spent clarifying requirements and scope. The remaining 35-50 minutes are for designing the system, discussing trade-offs, and handling follow-up questions. Senior and staff-level interviews may extend to 90 minutes with deeper architectural discussions.

Q: What is the difference between system design and object-oriented design interviews?

System design interviews focus on distributed systems architecture — how to design scalable, reliable systems using components like databases, caches, message queues, and load balancers. Object-oriented design (OOD) interviews focus on class hierarchies, design patterns, and code-level abstractions for a single application. System design is about the big picture; OOD is about the details within one component. Most senior interviews include system design; OOD is more common at mid-level.

Q: What are the most common system design interview questions?

The most frequently asked system design questions include: Design a URL shortener, Design a chat/messaging system, Design a news feed, Design a rate limiter, Design a notification system, Design a search autocomplete, and Design a file storage system. These cover core distributed systems concepts — hashing, pub/sub, fan-out, caching, and distributed storage — that apply broadly across system design problems.

Q: How do I prepare for system design interviews?

Preparation strategy: (1) Study the fundamental building blocks — databases (SQL vs. NoSQL), caching (Redis, Memcached), message queues (Kafka, RabbitMQ), load balancers, CDNs, and consistent hashing. (2) Practice 10-15 common problems using a structured framework. (3) Do mock interviews with a partner who can ask follow-up questions. (4) Read engineering blogs from companies like Netflix, Uber, and Stripe to understand real-world architecture decisions. (5) Focus on trade-offs rather than memorizing solutions — interviewers want to see how you think, not what you have memorized.

System design interviews test whether you can architect scalable, reliable distributed systems. Unlike coding interviews where there is a correct answer, system design is about demonstrating structured thinking, making defensible trade-offs, and communicating your reasoning clearly. The interviewer is not looking for the "right" architecture — they are evaluating how you break down ambiguous problems and navigate complexity.

This guide covers the framework you should use for every system design question, followed by 20 of the most commonly asked problems with structured approach outlines. For each problem, we provide the key components, data model considerations, and the scaling trade-offs that distinguish strong candidates from average ones. For additional company-specific preparation, check our Anthropic interview prep guide or browse all interview prep articles.

Problems Covered

Step Framework

45–60

Minutes per Interview

The System Design Interview Framework

Use this framework for every system design question. It keeps you structured and ensures you cover what interviewers are evaluating. Spend your time roughly as shown.

Requirements Clarification (5 min)

Ask questions to scope the problem. Distinguish functional requirements (what the system does) from non-functional requirements (scale, latency, availability, consistency). Clarify: How many users? Read-heavy or write-heavy? What latency is acceptable? What consistency model? This step sets the constraints that drive every subsequent decision.

Back-of-Envelope Estimation (3 min)

Calculate QPS (queries per second), storage requirements, and bandwidth. Example: 100M users, 10% daily active, 5 reads and 1 write per active user = 50M reads/day = ~580 QPS reads, 10M writes/day = ~115 QPS writes. This determines whether you need sharding, caching, CDNs, etc.

High-Level Design (15 min)

Draw the core components and data flow: clients, load balancers, web servers, application servers, databases, caches, message queues, CDNs. Show how data flows from the user action to storage and back. Keep it simple — 5-8 boxes connected with arrows is usually the right level.

Detailed Design (15 min)

Dive deep into 1-2 components the interviewer cares about. Common deep-dives: database schema, caching strategy, data partitioning, consistency model, failure handling. The interviewer will guide you — watch for signals about what they want to explore.

Trade-offs & Bottlenecks (5 min)

Identify what could fail, how to handle it, and alternative approaches you considered. Discuss: single points of failure, data consistency vs. availability trade-offs, scaling bottlenecks, monitoring and observability. This is where senior candidates distinguish themselves.

20 Common System Design Problems

Below is each problem with the key components, approach, and scaling considerations. These are starting points for your design, not complete solutions — real interviews require you to explore each one in depth based on the interviewer's questions.

1. Design a URL Shortener

Key components: URL generation service, key-value store, redirect service, analytics service

Generate a unique short key using base62 encoding of an auto-incrementing ID or a hash of the URL (MD5/SHA256 truncated to 7 characters). Store the mapping in a key-value store (Redis for hot data, database for persistence). The redirect service looks up the short key, returns a 301/302 redirect. For 100M URLs: ~5GB storage. Key decision: 301 (cached by browser, fewer server hits, less analytics) vs. 302 (every click hits your server, better analytics).

Scale: Cache popular URLs in Redis. Shard by key prefix. Pre-generate key ranges to avoid contention.

HashingKey-Value StoreCaching

2. Design a Chat System (WhatsApp)

Key components: WebSocket servers, message queue, user presence service, message store, push notification service

Use WebSockets for real-time bidirectional communication. When User A sends a message to User B: the message goes to User A's WebSocket server, gets routed to a message queue, then delivered to User B's WebSocket server (or stored for offline delivery). Message store: Cassandra or HBase for write-heavy, time-ordered data. User presence: heartbeat-based detection with Redis. Group chats: fan-out on write (store per-user) vs. fan-out on read (store per-group) — depends on group sizes.

Scale: Shard message storage by user ID. Partition WebSocket servers by geography. Use Kafka for reliable message routing.

WebSocketsMessage QueueFan-out

3. Design a News Feed (Twitter/X)

Key components: Post service, fan-out service, feed cache, timeline service, ranking service

Two approaches: Fan-out on write (push) — when a user posts, immediately write to all followers' feeds. Fast reads but expensive writes for users with millions of followers. Fan-out on read (pull) — when a user opens their feed, fetch and merge posts from everyone they follow. Slow reads but cheap writes. Hybrid approach: fan-out on write for users with <10K followers, fan-out on read for celebrities. Feed cache: Redis sorted sets ordered by timestamp. Ranking: apply ML-based relevance scoring before returning results.

Scale: Shard feed cache by user ID. Use Kafka for async fan-out. Cache celebrity feeds at CDN level.

Fan-outRedisRanking

4. Design a Rate Limiter

Key components: Rate limiting middleware, Redis for counters, configuration service

Algorithms: Token Bucket (best for allowing bursts), Sliding Window Log (precise but memory-intensive), Sliding Window Counter (good balance of precision and efficiency). Use Redis for distributed counting across multiple API servers. Key decisions: rate limit by user ID, API key, or IP address. Return 429 with Retry-After header. Store rate limit rules in a configuration service for easy updates without redeployment.

Scale: Redis cluster for high throughput. Local in-memory counters with periodic sync to reduce Redis calls.

Token BucketRedisMiddleware

5. Design a Notification System

Key components: Notification service, user preference store, template engine, delivery services (push, email, SMS), message queue

Event-driven architecture: services publish events to a message queue (Kafka). The notification service consumes events, checks user preferences (opt-in/opt-out, channel preference, frequency caps), renders templates, and routes to the appropriate delivery service. Each delivery channel (push, email, SMS) has its own queue and retry logic. Store notification history for deduplication and analytics. Key design decisions: real-time vs. batched delivery, priority levels (urgent vs. informational), rate limiting per user to prevent notification fatigue.

Scale: Separate queues per channel. Batch non-urgent notifications. Use dead-letter queues for failed deliveries.

Event-DrivenMessage QueuePub/Sub

6. Design a Search Autocomplete

Key components: Trie data structure, query log aggregator, ranking service, caching layer

Build a trie (prefix tree) where each node stores the top-K suggestions for that prefix. Periodically rebuild the trie from aggregated query logs (daily/hourly batch job). Serve suggestions from an in-memory trie with a caching layer for the most common prefixes. Ranking: combine query frequency with recency, personalization, and trending signals. Response time target: <100ms. Key trade-off: freshness vs. cost — rebuilding the trie more frequently captures trending queries but costs more compute.

Scale: Shard trie by first character. Cache top prefixes at CDN. Use offline MapReduce for aggregation.

TrieCachingRanking

7. Design a Web Crawler

Key components: URL frontier (queue), fetcher workers, content parser, URL deduplication, robots.txt handler, storage

BFS-based crawling with a priority queue (URL frontier). Fetcher workers pull URLs, download pages, extract links, and add new URLs to the frontier. URL deduplication using bloom filters to avoid re-crawling. Respect robots.txt and implement politeness delays per domain. Store raw HTML in blob storage, parsed content in a database. Key challenges: handling dynamic JavaScript-rendered pages (headless browser), detecting content traps (infinite calendars, session IDs in URLs), and managing crawl budget across millions of domains.

Scale: Distribute workers across regions. Partition URL frontier by domain. Use consistent hashing for domain assignment.

BFSBloom FilterDistributed Queue

8. Design a Video Streaming Service (YouTube)

Key components: Upload service, transcoding pipeline, CDN, metadata service, recommendation engine

Upload: chunked upload to blob storage (S3). Transcoding: async pipeline converts raw video to multiple resolutions and formats (H.264, VP9, AV1) using a job queue. Store transcoded segments for adaptive bitrate streaming (HLS/DASH). Serve through CDN with geographic distribution. Metadata (title, description, views) in a relational database. View counting: eventual consistency is fine — batch count updates from a stream processor. Recommendations: collaborative filtering + content-based signals.

Scale: CDN edge caching for popular videos. Shard metadata by video ID. Use Kafka for view count streaming.

CDNTranscodingAdaptive Streaming

9. Design an E-Commerce System

Key components: Product catalog, cart service, order service, payment service, inventory service, search service

Microservices architecture: each domain (catalog, cart, orders, payments, inventory) is a separate service. Product catalog: read-heavy, use Elasticsearch for search with a relational DB as source of truth. Cart: Redis for session-based carts, database for persistent carts. Orders: event-sourced for auditability. Inventory: requires strong consistency — use optimistic locking to prevent overselling. Payment: idempotency keys to handle retries safely. Key challenge: distributed transactions across inventory and payment — use the Saga pattern with compensating transactions.

Scale: Cache product pages at CDN. Shard orders by user ID. Use event-driven communication between services.

MicroservicesSaga PatternEvent Sourcing

10. Design a Ride-Sharing Service (Uber)

Key components: Location service, matching service, trip service, pricing engine, ETA calculator

Location tracking: drivers send GPS updates every 3–5 seconds to a geospatial index (geohash-based). Matching: when a rider requests a ride, query nearby available drivers using geospatial search, rank by ETA and rating, and offer the trip sequentially. Trip state machine: REQUESTED → MATCHED → DRIVER_EN_ROUTE → IN_PROGRESS → COMPLETED. Pricing: surge pricing based on real-time supply/demand ratio in a geohash region. ETA: pre-computed routing with real-time traffic adjustments.

Scale: Partition geospatial index by region. Use WebSockets for real-time driver location updates. Kafka for trip event streaming.

GeospatialReal-timeState Machine

11. Design a File Storage System (Dropbox)

Key components: File chunking service, metadata service, sync service, block storage, deduplication

Split files into fixed-size chunks (4MB). Hash each chunk (SHA256) for deduplication — identical chunks are stored once. Metadata service tracks file-to-chunk mappings, versions, and permissions. Sync: client maintains a local hash tree, compares with server to determine which chunks need uploading/downloading. Use long polling or WebSockets for real-time sync notifications. Block storage: S3 or GCS with replication across regions. Conflict resolution: last-write-wins with versioning, or create a conflict copy.

Scale: Shard metadata by user. CDN for frequently accessed public files. Delta sync (only upload changed chunks).

ChunkingDeduplicationDelta Sync

12. Design a Social Network

Key components: User graph service, post service, feed service, media storage, privacy service

User graph: store relationships (follow, friend, block) in a graph database (Neo4j) or adjacency lists in a relational DB. Feed generation: hybrid fan-out (see News Feed above). Media: upload to blob storage, serve via CDN. Privacy: per-post visibility controls (public, friends-only, custom lists) evaluated at read time. Key challenge: "mutual friends" and "people you may know" queries are expensive on large graphs — pre-compute with batch jobs and cache results.

Scale: Shard user data by user ID. Graph partitioning for social graph queries. Cache friend lists in Redis.

Graph DatabaseFan-outPrivacy

13. Design an Ad Click Aggregator

Key components: Click ingestion service, stream processor, aggregation store, query API, fraud detection

High-throughput event ingestion: accept click events via a lightweight HTTP endpoint, write to Kafka. Stream processing (Flink or Spark Streaming): aggregate clicks by ad_id, campaign_id, and time window (1-minute, 1-hour buckets). Store aggregates in a time-series database (ClickHouse, TimescaleDB). Fraud detection: real-time rules (click frequency per IP, device fingerprint anomalies) and batch ML models for sophisticated detection. Key requirement: exactly-once processing to avoid counting duplicates, especially important for billing.

Scale: Kafka partitioning by ad_id. Pre-aggregate at the edge. Use lambda architecture for real-time + batch accuracy.

Stream ProcessingTime-SeriesExactly-Once

14. Design a Key-Value Store

Key components: Hash ring for partitioning, replication manager, conflict resolution, read/write coordinator

Use consistent hashing to distribute keys across nodes. Replicate each key to N nodes (typically 3) for durability. Tunable consistency: W (write quorum) + R (read quorum) > N for strong consistency, or W=1, R=1 for high availability. Conflict resolution: vector clocks for versioning, last-write-wins for simplicity. Anti-entropy: Merkle trees to detect and repair inconsistencies between replicas. Failure detection: gossip protocol. This is essentially the architecture behind DynamoDB and Cassandra.

Scale: Add nodes to the hash ring with virtual nodes for balance. Hinted handoff for temporary failures.

Consistent HashingQuorumVector Clocks

15. Design a Job Scheduler

Key components: Job submission API, job queue, scheduler, worker pool, job state store

Jobs are submitted via API with execution parameters (schedule, priority, retries). Scheduler: for cron-style recurring jobs, use a priority queue sorted by next execution time. For one-time jobs, use a distributed queue (SQS, RabbitMQ). Worker pool: auto-scaling based on queue depth. Job state machine: PENDING → RUNNING → SUCCEEDED/FAILED/RETRYING. Idempotency: jobs must be safe to retry. Dead-letter queue for jobs that exceed max retries. Key challenge: exactly-once execution — use distributed locks or lease-based assignment to prevent duplicate execution.

Scale: Partition job queue by priority. Separate pools for long-running vs. quick jobs. Use leader election for scheduler HA.

Priority QueueDistributed LocksAuto-scaling

16. Design a Payment System

Key components: Payment gateway, ledger service, fraud detection, reconciliation service, notification service

Double-entry ledger: every transaction creates two entries (debit and credit) that must balance. Idempotency keys on every payment request to handle retries safely. Payment flow: authorize → capture → settle (two-phase approach for better error handling). Fraud detection: real-time rules engine + ML model scoring before authorization. Reconciliation: batch job compares ledger with bank statements daily. Key requirements: ACID transactions for the ledger (PostgreSQL), audit trail for every state change, PCI-DSS compliance for card data handling.

Scale: Shard ledger by account ID. Async processing for non-critical steps. Event sourcing for complete audit trail.

Double-Entry LedgerIdempotencyACID

17. Design a Recommendation Engine

Key components: Feature store, model serving, candidate generation, ranking service, A/B testing platform

Two-stage pipeline: (1) Candidate generation — narrow millions of items to hundreds using collaborative filtering (users who liked X also liked Y) and content-based filtering (items similar to what the user has engaged with). (2) Ranking — score candidates using a more complex model that considers user features, item features, and context (time, device, location). Feature store provides real-time features for scoring. Serve recommendations from a cache with periodic re-computation. Cold start: use popularity-based recommendations for new users, content features for new items.

Scale: Pre-compute candidate sets offline. Cache ranked recommendations. A/B test model changes.

Collaborative FilteringFeature StoreTwo-Stage

18. Design a Distributed Cache

Key components: Cache nodes, consistent hashing, eviction policy, cache client, health monitoring

Distribute cache across nodes using consistent hashing. Eviction: LRU (least recently used) is the most common, but LFU (least frequently used) is better for workloads with stable hot keys. Cache patterns: Cache-aside (application manages cache reads/writes), Write-through (writes go to cache and DB simultaneously), Write-behind (writes go to cache, asynchronously flushed to DB). Key challenges: cache invalidation (one of the two hard problems in CS), thundering herd (many cache misses simultaneously when a popular key expires), and cache warming on cold starts.

Scale: Add nodes with consistent hashing. Replicate hot keys across multiple nodes. Use local in-process cache + distributed cache.

Consistent HashingLRU/LFUCache-Aside

19. Design a Metrics Monitoring System

Key components: Metric collection agents, ingestion pipeline, time-series database, query engine, alerting service, dashboard

Agents on each server collect metrics (CPU, memory, disk, custom app metrics) and push to an ingestion service. Ingestion writes to a time-series database (InfluxDB, Prometheus, or custom). Data model: metric name + tags + timestamp + value. Downsample old data to reduce storage (1-second resolution for last hour, 1-minute for last week, 1-hour for last year). Query engine supports aggregations (avg, p99, sum) across time windows and tag dimensions. Alerting: threshold-based rules and anomaly detection, with notification routing (PagerDuty, Slack).

Scale: Partition time-series by metric name and time range. Use bloom filters for tag indexing. Tiered storage for cost efficiency.

Time-Series DBDownsamplingAlerting

20. Design an API Rate Limiter (Distributed)

Key components: Rate limiter service, distributed counter store (Redis Cluster), configuration service, analytics

Extend the basic rate limiter (problem 4) to work across multiple API gateway instances. Use Redis Cluster for distributed counters with Lua scripts for atomic increment-and-check operations. Sliding window algorithm: maintain a sorted set of request timestamps per client, count requests within the window, reject if over limit. Multi-tier limits: per-second burst limit + per-minute sustained limit + per-day quota. Configuration service stores per-client rate limit rules. Analytics: log rate limit events for capacity planning and abuse detection. Graceful degradation: if Redis is unavailable, fall back to local in-memory rate limiting with reduced accuracy.

Scale: Redis Cluster sharded by client ID. Local caching of rate limit rules. Edge rate limiting at CDN level for DDoS protection.

Sliding WindowRedis ClusterLua Scripts

Recommended Study Resources

The best way to prepare for system design interviews is a combination of reading and practice. Here are the most valuable resources (titles only — no external links):

Designing Data-Intensive Applications by Martin Kleppmann — the single best book for understanding distributed systems fundamentals. Read chapters on replication, partitioning, and consistency models.
System Design Interview by Alex Xu (Vol 1 and Vol 2) — structured walkthroughs of common problems. Good for pattern recognition.
Engineering blogs from Netflix, Uber, Stripe, Meta, and Airbnb — real-world architecture decisions with detailed trade-off analysis.
Mock interviews with a partner — the single most effective preparation method. Practice explaining your thinking out loud while drawing diagrams.

Interview Tip "The biggest mistake in system design interviews is jumping into the solution without understanding the requirements. Spend the first 5 minutes asking questions. It shows maturity and ensures you are solving the right problem."

For more interview preparation, see our REST API interview questions, data analyst interview questions, the evolution of technical interviews, and company-specific guides for Anthropic, Stripe, and Databricks.

Frequently Asked Questions

What framework should I use for system design interviews?+

Use a 5-step framework: (1) Requirements clarification, (2) Back-of-envelope estimation, (3) High-level design, (4) Detailed design of 1-2 components, (5) Trade-offs and bottlenecks. Spend about 5 minutes on requirements, 3 on estimation, 15 on high-level design, 15 on deep dives, and 5 on trade-offs.

How long should a system design interview last?+

Most system design interviews are 45-60 minutes. Spend the first 5-10 minutes clarifying requirements. The remaining time is for designing, discussing trade-offs, and handling follow-up questions. Senior interviews may extend to 90 minutes.

What is the difference between system design and object-oriented design interviews?+

System design focuses on distributed systems architecture — databases, caches, message queues, load balancers. OOD focuses on class hierarchies and design patterns within a single application. System design is about the big picture; OOD is about details within one component.

What are the most common system design interview questions?+

The most frequently asked: URL shortener, chat/messaging system, news feed, rate limiter, notification system, search autocomplete, and file storage system. These cover core distributed systems concepts that apply broadly.

How do I prepare for system design interviews?+

Study building blocks (databases, caching, message queues, consistent hashing). Practice 10-15 problems using a structured framework. Do mock interviews. Read engineering blogs from Netflix, Uber, and Stripe. Focus on trade-offs rather than memorizing solutions.

What level of detail is expected?+

Demonstrate breadth in the high-level design and depth in 1-2 areas the interviewer wants to explore. You do not need to design every component in detail. Being able to articulate trade-offs is more important than having a perfect design.

Find engineering roles at top tech companies

Browse backend and infrastructure roles with culture context, or explore company-specific interview prep guides.

Browse Engineering Jobs → All Interview Prep Guides →