TikTok System Design

Never dive directly into the design phase; it may raise red flags during the interview!


Interviewer: Let's discuss the system design for a platform similar to TikTok

TikTok is a social media app that allows users to create, share, and discover short videos.

1. Feature Expectations [5 mins]

You: Could you clarify the expected user interaction patterns we should optimize for? For example, are we focusing more on video consumption or on user-generated content and interactions?

Understanding the big picture will guide us in detailing how many users the system should handle, how fast it needs to perform, and what security measures are crucial.

Interviewer: That’s a good question. We aim to balance both aspects, with a strong emphasis on user-generated content and interactions. How would you approach managing user privacy and data security given the scale of the platform?

You: That makes sense. Regarding scalability, are we considering a cloud-native approach, or should we also explore hybrid or on-premises options for specific components?

Interviewer: Good point. We're primarily looking at a cloud-native approach, leveraging scalability and flexibility. Lastly, how would you ensure the system maintains high availability during peak traffic times or unexpected surges?

You: Understood. Could you clarify the expected uptime requirements and any specific disaster recovery strategies we should consider?

Interviewer: Absolutely. We aim for at least 99.99% uptime and would appreciate your insights on how to achieve that. These questions help ensure we're on the same page regarding the system's goals and constraints.

You: Thank you for the clarification. Based on our discussion, let's outline the functional requirements for the system.

Functional Requirements

  1. Video Upload and Processing
  2. User Interactions
  3. Personalized Feeds
  4. Content Discovery

These features are central to the user experience and represent the primary interactions on the platform.

To keep our discussion focused and manageable, I won't delve into secondary features like Trends, Continuous Engagement and Hashtags at this time. These aspects are important for enhancing discoverability and engagement but require a deeper exploration that goes beyond our current scope.

Understanding the diverse user base of TikTok is crucial for effective platform design. TikTok attracts individuals seeking creative expression, celebrities looking to engage with fans, organizations aiming to reach wider audiences, and advertisers seeking to promote products or services. Each user segment has unique needs and expectations, from seamless content creation tools to effective promotional opportunities.

By knowing our audience, whether they are creators, influencers, brands, or advertisers, we can tailor the platform's features and functionalities to enhance user engagement, promote content discoverability, and provide a satisfying experience that meets their specific goals.

Limit the number of features discussed to one or two, as covering more can be time-consuming and may detract from explaining the most critical aspects of the design.



2. Estimations [5 mins]

Understanding the system's data processing needs is critical not only for ensuring sufficient network and storage capacity but also for anticipating scalability requirements as user demands grow. This proactive approach helps in designing a resilient system that can handle peak loads efficiently, maintaining consistent performance and reliability over time.

  • Assuming Daily Active Users (DAU) of 500 million users.
  • On average, each user uploads 1 video per day on average, totaling 500 million video uploads daily.
  • Assuming 10 MB per video, resulting in a daily storage requirement of 5 PB (petabytes).
  • With 1 million requests per second (RPS) at peak times, we require 1,000 servers (workers) and 50 PB of daily bandwidth to handle data transfers efficiently.

Clear estimations demonstrate planning and analytical skills crucial for system scalability and performance assessment.



3. Design Goals [5 mins]

Based on estimations and discussions, the non-functional requirements for the system include addressing critical aspects such as performance under high user loads, robust data security measures, and scalable infrastructure to accommodate rapid growth.

  1. High Availability
  2. Scalability
  3. Security
  4. Performance
  5. Usability

Specify latency/throughput targets and decide on consistency/availability levels based of estimations discussed for robust system design.



4. High-Level Design [5-8 mins]

To design a high-level system architecture for TikTok, we'll focus on the following aspects:

I. APIs for Read/Write Scenarios

Defining APIs for TikTok is crucial as it establishes structured communication channels between different system components. This organization simplifies future feature enhancements and modifications. APIs facilitate controlled access to TikTok's functionalities, enabling seamless integration with external programs or services. This capability supports the platform's growth and expansion, ensuring a cohesive and scalable ecosystem for users, developers, and partners alike.

Video Upload

Endpoint Parameters Response
POST /api/videos/upload video_file, title, description {"message": "Video uploaded successfully."}

User Interactions

Endpoint Parameters Response
POST /api/videos/interactions video_id, action {"message": "Action recorded successfully."}

Recommendation

Endpoint Parameters Response
GET /api/feed user_id {"feed": [...array of videos data...]}

Follow User

Endpoint Parameters Response
POST /users/{user_id}/follow user_id, target_user_id Follow relationship data (user_id, target_user_id)

Unfollow User

Endpoint Parameters Response
DELETE /users/{user_id}/follow user_id, target_user_id Success or error message

II. Database Schema

In designing TikTok's backend architecture, NoSQL databases are chosen over traditional SQL databases due to their ability to handle large volumes of unstructured or semi-structured data, which suits TikTok's dynamic content creation and consumption model. With 500 million daily active users generating and interacting with extensive video content, NoSQL databases like MongoDB or Cassandra provide scalable storage and retrieval capabilities without the rigid schema constraints of SQL databases, supporting seamless performance during peak usage periods.

Users Table


A graph database is ideal for storing user information in TikTok due to its ability to efficiently represent complex relationships between users, such as followers, following, and shared interests, which are crucial for personalized content recommendations and social network analysis.

Examples of graph databases include Neo4j, Amazon Neptune, and JanusGraph, which excel in managing interconnected data structures and performing fast queries for relationship-based data models like social networks.

Example of user data representation in a Graph Database

User Image

Videos Table

    {
        "video_id": "UUID",
        "user_id": "UUID",
        "title": "string",
        "media_url": "string",
        "likes": "number",
        "views": "number",
        "media_url": "string",
        "comments": [
            {
                "user_id": "UUID",
                "comment_text": "string",
                "comment_time": "timestamp",
            },
            ...
        ]
        "created_at": "timestamp"
    }

Recommendation Table

    {
          "user_id": "UUID",
          "recommendations": [
              {
                  "post_id": "UUID",
                  "post_author_id": "UUID",
                  "created_at": "timestamp"
              },
              {
                  ...
              }
          ]
      },
      {
           ...
      },
    }

Ensure clear and concise communication of design choices and their implications to demonstrate deep understanding and critical thinking.



5. Deep Dive [10-12 mins]

Designing a system step by step helps us solve difficult problems by breaking them into smaller, easier tasks. This way, each part of the system gets careful planning and fits together smoothly. It also lets us make changes and improvements as needed based on feedback and new needs, which helps create a strong and successful final product like TikTok.

1. Users

Users are individuals who engage with TikTok by creating, uploading, and interacting with videos. User data includes profiles, preferences, and social interactions, crucial for personalized content delivery.

User Image


2. Load Balancer

Load Balancer is critical for ensuring that the incoming traffic is evenly distributed across multiple servers. This helps in managing the load and ensuring high availability and reliability of the service. It prevents any single server from becoming a bottleneck.


3. API Gateway

API gateway acts as a single point of entry for all API calls, consolidating various microservices or backend services behind a unified API interface. Centralizes API requests from clients, managing authentication, rate limiting, and routing requests to appropriate backend services like user data retrieval or video operations.

Key functionalities include:
  • /POST Video: Initiates a request to upload a new video to TikTok.
  • /GET Video: Retrieves details of a specific video based on its identifier.
  • /GET Feed: Retrieves personalized video feeds for a user based on their preferences and interactions.
  • /POST Follow: Establishes a request to follow another user on TikTok, creating a social connection.
  • /POST Unfollow: Sends a request to unfollow another user, terminating the social connection.
  • /POST Interact: Initiates a request to interact with a video, such as liking, commenting, or sharing.
The API Gateway abstracts the complexity of the backend services from the clients and provides a unified interface.


4. User Service

User service handles user-related functionalities including user authentication, profile management, and social connections. It interfaces with the User Database and manages user sessions, permissions, and notifications.


5. User Database

User database stores user profiles, preferences, and social interactions (e.g., follows, likes). This data supports personalized content recommendations and social network functionalities within TikTok.


6. Media Service

Media Service is responsible for managing media-related operations such as processes uploads, stores media files in the Object Store, and updates the Message Database with URLs linking to stored media. It ensures efficient management and retrieval of media content, supporting multimedia messaging capabilities within the chat application.



7. Media Storage

Media storage serves as a dedicated repository, the Object Store securely stores media files uploaded by users within the chat application. It efficiently manages large volumes of media data and integrates with the Media Service for seamless upload, storage, and retrieval of media files associated with messages exchanged between users.


8. Video Database

Video database stores metadata associated with each video, including unique identifiers, titles, descriptions, and timestamps. It also includes a reference to the media URL where the video content is stored.



9. User Interaction Service

User Interaction Service manages user activities like following/unfollowing users, updating social graphs, and handling interactions such as liking and commenting on videos. It ensures real-time updates and supports critical engagement features for TikTok's social platform dynamics.



10. Recommendation Service

Recommendation service curates and delivers personalized video feeds tailored to each user's preferences and interactions. It utilizes algorithms to select and rank videos, ensuring content relevancy and engagement through both push (real-time updates) and pull (browsing and searching) mechanisms.


Recommendation Generation

1. Push Model
The push model proactively delivers new content to the user based on their preferences and interactions. This approach ensures that users receive timely and relevant updates without having to search for them.
I. User Interaction Tracking: Track user interactions such as likes, comments, shares, and follows in real-time.
II. Content Matching: Use algorithms (like collaborative filtering and content-based filtering) to identify new videos that match the user's preferences.
III. Real-Time Updates: Push new video recommendations to the user's feed as soon as they become relevant.

2. Pull Model
The pull model allows users to actively request and explore content by scrolling through their feed, searching for specific hashtags, or exploring trends.
I. User Request: The user requests new content by scrolling their feed or performing a search.
II. Content Retrieval: Fetch relevant videos from the database based on the user's request.

11. Recommendation Database

Recommendation database stores data related to content recommendations for users. This includes recommended videos, user interaction history, and preference scores calculated by the recommendation algorithms. This database supports the Recommendation Service by providing personalized video suggestions based on user behavior and interactions.



12. Notification Service

Manages the delivery of notifications to users for various interactions such as likes, comments, follows, and mentions. It ensures users are promptly informed about activities related to their content and interactions on the platform.

High level architecture of TikTok



6. Futher Optimizations [2 - 5 mins]

1. Caching

Caching in the TikTok architecture plays a crucial role in improving performance and user experience by storing frequently accessed data in high-speed memory. This reduces the strain on primary databases and services, leading to faster data retrieval and response times. User cache, video cache, and recommendation cache specifically streamline access to user profiles, popular videos, and personalized content suggestions, respectively, ensuring efficient service delivery and enhanced scalability for high traffic volumes on the platform.

Data Stored in Cache

  {
      "key": "string",
      "value": "string"
  }

I. User Cache

The User Cache stores frequently accessed user data to reduce latency and improve performance. It enhances the efficiency of retrieving user information and interactions.

Example User Cache

  [
      {
          "key": "f5498eadb82c40e3209d0227e6b7fdba",
          "value": {
              "user_id": "f5498eadb82c40e3209d0227e6b7fdba",
              "username": "TikTokuser",
              "email": "user@example.com",
              "password_hash": "a40e1c2480f5489e6c6f4592674a4cf4",
              "created_at": "2020-05-22 07:26:27"
          }
      },
      {
           ...
      },
  ]

II. Video Cache

Caches frequently accessed video metadata and media URLs to reduce database load and improve response times for video retrieval requests.

Example Tweet Cache

  [
      {
          "key": "dd44dbe2728e1944d25ee8c35cd9cb01",
          "value": {
              "video_id": "dd44dbe2728e1944d25ee8c35cd9cb01",
              "user_id": "2d182f5c9395f587d3712c6c29fbdc42",
              "title": "Sunshine in my mind",
              "likes": 185,
              "views": 1478,
              "media_url": "https://example.com/some-url",
              "created_at": "2024-07-09 22:10:55"
          }
      },
      {
           ...
      },
  ]

III. Recommendation Cache

Caches recommendation data to quickly provide users with personalized content, reducing the need to repeatedly query the Recommendation Database.

Example Recommendation Cache

  [
      {
          "key": "684e9771fd442c1bb6b1c05cf17d14c8",
          "value": {
              "user_id": "1250b56d67f7b288c56d51975239e918",
              "recommendations": [
                  {
                      "post_id": "9fc9d913973592188cb15df207700a74",
                      "post_author_id": "9fc9d913973592188cb15df207700a74",
                      "created_at": "2023-09-01 22:03:55"
                  },
                  {
                      ...
                  }
              ]
          }
      },
      {
           ...
      },
  ]


2. CDN (Content Delivery Network)

CDN enhances video delivery performance globally by caching content at edge locations. This reduces latency and improves streaming quality for users worldwide.

I. Push CDN
Push CDNs proactively distribute pre-cached content from origin servers to edge servers globally, optimizing availability and reducing latency for static content like images and videos.

II. Pull CDN
Pull CDNs fetch content from origin servers on demand, checking edge server caches first and dynamically caching new or updated content, ideal for dynamic websites needing real-time updates and flexibility in content delivery.

High level architecture of TikTok after Optimizations



7. Data Flow [5-8 mins]

It's essential to explain the end-to-end flow of the design, ensuring clarity on how data moves through the system. Based on estimations and discussions, the non-functional requirements for the system include addressing critical aspects such as performance under high user loads, robust data security measures, and scalable infrastructure to accommodate rapid growth.


User Uploads Video:

  1. The user uploads a video through the TikTok app.
  2. The request is received by the API Gateway.
  3. The API Gateway routes the request through the Load Balancer to the Media Service.
  4. Media Service processes and stores the video in the Object Store.
  5. Media Service updates the Video Database with video metadata and the media URL.
  6. Video Cache is updated with video metadata for quick access.

User Interacts with Video:

  1. The user interacts with a video (like, comment, share), and the request is received by the API Gateway.
  2. The API Gateway routes the request through the Load Balancer to the User Interaction Service.
  3. User Interaction Service updates the User Database with the interaction data.
  4. User Cache is updated with frequently accessed user data.
  5. Notification Service sends notifications to relevant users, and Recommendation Service updates the Recommendation Database and Recommendation Cache with the new interaction data.

Following/Unfollowing Users:

  1. A user follows or unfollows another user through the Internet.
  2. The request reaches the API Gateway via the Load Balancer.
  3. The API Gateway processes the /POST Follower request, updating the User Cache and User Database.
  4. The News Feed Service adjusts the follower's news feed accordingly.
  5. The Notification Service sends out notifications if appropriate.

Don't forget to explain the end to end flow of your design!


This architecture is designed to be scalable, resilient, and efficient, ensuring that the platform can handle a high volume of user interactions and data processing with minimal latency and high availability.