Design A URL Shortener

A complete system design walkthrough β€” hash functions, base62 encoding, redirect flows, bloom filters & scalable architecture for 100M URLs/day.

⚑ 100M URLs / day πŸ“¦ 36.5 TB Storage πŸ”— Base62 Encoding πŸ›‘οΈ Bloom Filters

In this chapter, we will tackle an interesting and classic system design interview question: designing a URL shortening service like TinyURL.

Step 1 β€” Understand the Problem & Design Scope

System design interview questions are intentionally open-ended. To design a well-crafted system, it is critical to ask clarification questions.

Candidate: Can you give an example of how a URL shortener works?
Interviewer: Assume URL https://www.systeminterview.com/q=chatsystem&c=loggedin&v=v3&l=long is the original URL. Your service creates a shorter alias: https://tinyurl.com/y7keocwj. Clicking the alias redirects to the original URL.

Candidate: What is the traffic volume?
Interviewer: 100 million URLs are generated per day.

Candidate: How long is the shortened URL?
Interviewer: As short as possible.

Candidate: What characters are allowed?
Interviewer: A combination of numbers (0-9) and characters (a-z, A-Z).

Candidate: Can shortened URLs be deleted or updated?
Interviewer: For simplicity, assume they cannot.

Basic Use Cases

  1. URL shortening: given a long URL ⟹ return a much shorter URL
  2. URL redirecting: given a short URL ⟹ redirect to the original URL
  3. High availability, scalability, and fault tolerance

Back-of-the-Envelope Estimation

It is important to walk through these assumptions with your interviewer so that both of you are on the same page.

Step 2 β€” Propose High-Level Design & Get Buy-In

In this section we discuss the API endpoints, URL redirecting, and URL shortening flows.

API Endpoints

We will design REST-style APIs. A URL shortener primarily needs two endpoints:

  1. URL shortening. A client sends a POST request with the original long URL:

    POST api/v1/data/shorten
    
    - request body: { longUrl: string }
    - response:     shortURL
  2. URL redirecting. A client sends a GET request for the short URL:

    GET api/v1/:shortUrl
    
    - response: 301/302 redirect β†’ longURL

URL Redirecting

Figure 1 shows what happens when you enter a TinyURL in the browser. Once the server receives the request, it changes the short URL to the long URL with a redirect.

HTTP response headers showing a 301 redirect from tinyurl.com with a Location header pointing to the destination Amazon URL
Figure 1 β€” HTTP 301 redirect response headers
Sequence diagram showing client requesting short URL, tinyurl server responding with 301 and Location header, then client requesting the final Amazon URL
Figure 2 β€” Client–server URL redirect flow

301 vs 302 Redirect

301 redirect signals that the URL is permanently moved. The browser caches this response, so subsequent requests bypass the URL shortener and go directly to the destination. This reduces server load.

302 redirect signals a temporary move. Every request goes through the URL shortener first, allowing accurate click analytics.

If reducing server load is the priority, use 301. If analytics matter, use 302.

The simplest implementation uses a hash table storing <shortURL, longURL> pairs. To redirect:

URL Shortening

Assume the short URL looks like www.tinyurl.com/{hashValue}. We need a hash function f(x) that maps a long URL to a hash value, as shown in Figure 3.

Diagram showing longURL fed into hash function f(x) producing a short URL hash value
Figure 3 β€” Hash function mapping longURL to hashValue

The hash function must satisfy:

Step 3 β€” Design Deep Dive

We dive deep into: data model, hash function, URL shortening flow, and URL redirecting flow.

Data Model

While a hash table is a good starting point, it is not feasible at scale due to limited and expensive memory. A better approach is to store <shortURL, longURL> mappings in a relational database.

Database table schema with columns: id (primary key), shortURL, and longURL
Figure 4 β€” URL database schema

Hash Function

The hash function converts a long URL into a short URL (the hashValue).

Hash Value Length

The hashValue uses characters from [0-9, a-z, A-Z] β€” 62 possible characters. We need the smallest n such that 62ⁿ β‰₯ 365 billion.

n Maximal number of URLs
1 62
2 3,844
3 238,328
4 14,776,336
5 916,132,832
6 56,800,235,584
7 ~3.5 trillion βœ“
8 218,340,105,584,896

Table 1 β€” hashValue length vs. URL capacity (n=7 is sufficient for 365 billion URLs)

When n = 7, 62⁷ β‰ˆ 3.5 trillion β€” more than enough to hold 365 billion URLs. So the hashValue length is 7.

We will explore two types of hash functions: (1) hash + collision resolution, and (2) base 62 conversion.

Hash + Collision Resolution

Use well-known hash functions like CRC32, MD5, or SHA-1. Here are results for https://en.wikipedia.org/wiki/Systems_design:

Hash function Hash value (Hexadecimal)
CRC32 5cb54054
MD5 5a62509a84df9ee03fe1230b9df8b84e
SHA-1 0eeae7916c06853901d9ccbefbfcaf4de57ed85b

Table 2 β€” Hash function outputs (all exceed 7 characters)

Even the shortest (CRC32) exceeds 7 characters. The approach: collect the first 7 characters; if a collision exists, recursively append a predefined string and rehash until unique. See Figure 5.

Flowchart: longURL β†’ hash function β†’ shortURL β†’ check DB β†’ if exists append suffix and retry, if not save to DB
Figure 5 β€” Hash + collision resolution flowchart

This eliminates collisions but is expensive (DB query per request). A bloom filter can dramatically improve performance β€” it's a space-efficient probabilistic data structure to test set membership.

Base 62 Conversion

Base 62 uses 62 characters for encoding. Mappings: 0β†’0, …, 9β†’9, 10β†’a, …, 35β†’z, 36β†’A, …, 61β†’Z.

Example: convert 11157₁₀ to base 62:

Diagram showing repeated division of 11157 by 62, yielding remainders 59, 55, 2 which map to X, T, 2 in base 62
Figure 6 β€” Base 62 conversion of 11157₁₀ β†’ 2TX₆₂

Comparison of the Two Approaches

Hash + Collision Resolution Base 62 Conversion
Fixed short URL length URL length grows with ID
No unique ID generator needed Requires a unique ID generator
Collisions possible; must resolve No collisions β€” ID is unique
Next available URL is unpredictable (secure) Next URL is predictable (security concern)

Table 3 β€” Hash + Collision Resolution vs Base 62 Conversion

URL Shortening Deep Dive

We use base 62 conversion in our design. Figure 7 demonstrates the full shortening flow.

Flowchart: input longURL β†’ check DB β†’ if exists return shortURL, else generate unique ID β†’ convert to shortURL via base62 β†’ save to DB
Figure 7 β€” URL shortening deep-dive flow
  1. Input: the longURL is received.
  2. DB check: does the longURL already exist?
  3. If yes: fetch and return the existing shortURL.
  4. If no: generate a new unique ID (primary key) via the distributed ID generator.
  5. Convert the ID to a shortURL using base 62 conversion.
  6. Save (id, shortURL, longURL) to the database.

Concrete example:

id shortURL longURL
2009215674938 zn9edcu https://en.wikipedia.org/wiki/Systems_design

Table 4 β€” Example database entry

The distributed unique ID generator is central to this approach. Its primary function is generating globally unique IDs used to create shortURLs. See the "Design A Unique ID Generator in Distributed Systems" chapter for implementation details.

URL Redirecting Deep Dive

Figure 8 shows the detailed URL redirecting design. Since reads vastly outnumber writes, <shortURL, longURL> mappings are cached to improve performance.

Architecture diagram showing user request flowing to load balancer, then web servers, then checking cache before falling back to database, and returning the longURL
Figure 8 β€” URL redirecting architecture with cache layer

The redirect flow:

  1. User clicks: https://tinyurl.com/zn9edcu
  2. Load balancer forwards the request to web servers.
  3. Cache hit? Return the longURL immediately.
  4. Cache miss? Fetch the longURL from the database (invalid shortURL β†’ 404).
  5. Return the longURL to the user via HTTP redirect.

Step 4 β€” Wrap Up

We covered: API design, data model, hash function, URL shortening, and URL redirecting. Additional talking points for extra interview time:

πŸŽ‰ Congratulations on getting this far! Give yourself a pat on the back. Good job!

Reference Materials

[1] A RESTful Tutorial: https://www.restapitutorial.com/index.html

[2] Bloom filter: https://en.wikipedia.org/wiki/Bloom_filter