Real-Time Systems

Real-Time Location Tracking

Designing scalable architectures for ride-sharing (Uber/Grab): Geohashing, Quadtrees, Redis Geo, and low-latency socket delivery.

High-Level Architecture

The system must handle high-throughput writes (drivers moving every 3s) and heavy read queries (riders looking for nearby drivers).

📡 Ingestion Service

Protocol: WebSocket or MQTT over TCP.

Drivers maintain a persistent connection. Updates are lightweight (Lat, Long, DriverID). Load balanced via Consistent Hashing to keep a driver on the same node.

🗺️ Spatial Index (Redis)

Storage: In-memory Geospatial Index.

Stores ephemeral location data. TTL of 30 seconds. Used for "Find Nearby" queries. Persistent storage (Cassandra) acts as an archival log.

Spatial Indexing Strategies

How do we efficiently find " Drivers within 1km"? Using standard SQL WHERE clauses is too slow. we need spatial indexing.

Technique How it works Pros Cons
Geohash Encodes (Lat, Long) into a Base32 string. Longer string = more precision. Neighbors share prefixes. Easy to implement. Single string key. Edge cases (neighbors might have totally different hashes at boundaries).
Quadtree Recursively divides 2D space into 4 quadrants. Tree structure. Adaptable resolution (denser areas get deeper trees). Complex to rebuild dynamically if data changes fast.
Google S2 Maps sphere to 1D curve (Hilbert Curve). Handles Earth's curvature. Very fast region covering. Complex math library required.

Geohash Conceptual View

u4pru0
(Driver A)
u4pru1
(Driver B)
u4pru2
(Empty)
u4pru3
(User)

Nearby users share the prefix "u4pru"

Protocol: WebSocket vs MQTT

WebSocket
  • Full Duplex: Good for chat & location.
  • Heavy: Headers overhead can be significant.
  • Battery: Keeps radio active.
  • Use Case: Rider App (browsers support it natively).
MQTT
  • Lightweight: Binary protocol, min header 2 bytes.
  • Pub/Sub: Built-in topic routing.
  • Battery: Optimized for IoT/Mobile.
  • Use Case: Driver App (Native mobile implementation).

Implementation: Redis Geo

Redis has built-in geospatial support (using Geohashing internally). It's incredibly fast (O(log(N))).

Python Example
import redis

# Connect to Redis
r = redis.Redis(host='localhost', port=6379, db=0)

def update_driver_location(driver_id, lat, lon):
    """Add/Update driver location (GEOADD key lon lat member)"""
    r.geoadd("active_drivers", (lon, lat, driver_id))
    # Set expire to auto-remove inactive drivers
    # Note: GEO commands don't support EXPIRE directly on members, 
    # usually managed via separate ZSET or key logic in production.

def find_nearby_drivers(user_lat, user_lon, radius_km=1):
    """Find drivers within radius (GEORADIUS key lon lat radius unit)"""
    drivers = r.georadius(
        "active_drivers", 
        user_lon, 
        user_lat, 
        radius_km, 
        unit="km", 
        withdist=True, 
        withcoord=True, 
        sort="ASC"
    )
    return drivers

# Usage
update_driver_location("driver_101", 37.7749, -122.4194)
nearby = find_nearby_drivers(37.7750, -122.4180, 2)
print(nearby) 
# Output: [[b'driver_101', 0.1235, (37.7749, -122.4194)]]
⚠️ Scaling Redis: Single Redis instance can handle ~100k updates/sec. For global scale, Shard by City/Region (e.g., active_drivers:san_francisco). Do not put all world drivers in one key.

Summary

  • Ingestion: Use persistent connections (WebSocket/MQTT) to handle frequent updates.
  • Storage: Redis (Geospatial) for real-time reads. Cassandra/S3 for historical logs.
  • Indexing: Geohash/Google S2 allows efficient "K-Nearest Neighbors" queries.
  • Optimizations: Batch updates on the client side. Only send updates if movement > 10 meters.
  • Privacy: Fuzzy location techniques when precise accuracy isn't needed.