Architecture
Demystifying TStore: The Backbone of Billions of Transactions at PhonePe
Arnab Bir and Tushar Naik06 August, 2024
Chapter 1 : Unveiling TStore
Introduction
You are standing in the queue at a billing counter of a grocery store, and when your turn comes, you bill your items and make a scanned payment using the PhonePe app. Typically, you witness the green screen instantly and the latest transaction details are in the history feed. With a simple tap, you can also quickly validate any past payment, and show it to the shopkeeper if needed. It is also quite likely that the store billing counter has a speaker which immediately announces that the payment of Rs. X was successful. Alternatively, they can check if the transaction has gone through, on their business app.
Now, imagine thousands of such transactions happening every second, involving complex data exchanges including debits, credits, settlements, offers, notifications, services, refunds, and more. To ensure a seamless and real-time payment experience for users, with instant access to their transaction history feed, PhonePe relies on the Transaction Store (TStore). Beyond just transaction history and regardless of the payment category, TStore also plays a vital role in ensuring a consistent post-payment experience by eliminating the need for the app to perform scattered reads across diverse backend systems for each transaction.
Capturing the User Journey
Consider a mobile recharge flow, where you select a plan and top up your phone through PhonePe. Let’s see how it works!
Upon initiating mobile recharge through the PhonePe app, the Checkout Service retrieves available payment options from BillPay. Next you select your preferred method (UPI, card etc.) and submit payment details. The Payments Service validates constraints, initiates the transaction with the provider, and triggers fulfillment via BillPay. The app tracks the transaction status using the APIs of TStore as “PENDING” (non-terminal) during processing, and “COMPLETED” or “ERRORED” (terminal) at the end depending on the outcome, and forwards the same to you.
While the diagram above outlines the interaction between services during a recharge, the actual process might involve additional steps or variations depending on specific scenarios and features. That’s where TStore steps in, capturing, consolidating, and disseminating payment flow information to the stakeholders.
Capabilities: High-Velocity Ingestion, Reads and Beyond
High-Velocity Ingestion
Over 8000 transactions are processed each second by PhonePe during peak hours. For every transaction, multiple services that process different aspects of the transaction send updates to TStore. TStore keeps pace with ever-growing transaction volumes, seamlessly ingesting data with minimal latency (~45,000 RPS and < 10 ms).
Efficient Scrolling Through Transaction History
TStore powers APIs for pointer based scrolling mechanisms for efficient transaction retrieval. The app retrieves the transaction history using a forward and a backward pointer as follows.
Backward Scrolling for Initial Load
During initial app installation or when a user first accesses their transaction history, the client app utilizes TStore APIs to download transactions in descending order of update timestamps. This initial pre-load leverages a backward pointer to retrieve the most recent transactions first and subsequently requests for older transactions by passing the previously received pointer as reference. This allows TStore to efficiently return the next set of data points in reverse chronological order.
Forward Polling for Updates
To ensure users receive the latest updates, apps already installed leverage the forward polling mechanism involving periodic calls to the APIs using the forward pointer (obtained during the last successful polling cycle). By specifying this, the app retrieves only the new updates that have occurred since the last poll, minimizing data transfer and improving response times.
Real-Time Reads
Both internal applications and external services (eg. devices owned by merchants and consumers) rely on TStore for real-time access to transaction data, enabling features like instant transaction status and updates tracking (~100,000 RPS and < 20ms).
Flexible Querying Capability
Diverse stakeholders require different insights from the data. Users need to download Transaction Statements. Merchants gain insights into overall transaction volume, analyze sales trends and track customer behavior. Fraud and Risk Assessment systems need to oversee the transaction metrics to identify anomalies. TStore empowers them with the ability to query and retrieve specific transaction details efficiently via real-time data aggregation, filtering and sorting capabilities.
Historical Archiving Beyond Real-Time
TStore doesn’t forget the past. It seamlessly integrates with the data lake, enabling long-term storage and sophisticated analytics. This empowers deeper understanding of user behavior, payment trends, compliance reporting, fraud detection and business insights.
Data Consistency
Imagine a scenario where a merchant issues a refund to a customer. In a traditional system, inconsistencies could arise if updates to the customer’s and merchant’s accounts occur asynchronously in a non-deterministic order. This could lead to temporary discrepancies in their respective transaction histories, causing confusion and trust issues. Hence every transaction detail, from the smallest amount to the most intricate ones are stored consistently across the system, spread across geographies.
High Availability
Unforeseen disruptions like network outages, hardware failures, or natural disasters can severely impact critical infrastructures. By implementing redundancy, replication and disaster recovery mechanisms, TStore prevents any service disruption, ensuring business continuity seamlessly.
Terminologies
To establish the foundation, let’s first define some key terms before exploring TStore’s capabilities in detail.
Entity
An entity represents the most atomic element in TStore. It could be some action taken by the user on the system, or supporting data elements generated by systems themselves. For example a payment to a friend, a recharge for a mobile number, a cashback to the user and so on. Different parts of an entity can be owned by different participating services responsible for handling various steps needed to take the action to completion (or to failure). Entities encapsulate details like state, timestamp, and relevant data specific to that action.
Unit
All entities are united by a common ID called Unit Id. This ID acts as the thread that weaves them into a cohesive narrative of the entire payment journey, consolidating their information, states, and timestamps into a unified record called Unit.
Views
TStore prioritizes data privacy and security by adhering to the principle of least privilege. This is enforced by a concept called Views. Imagine Views as customizable filters applied to Unit details. These filters ensure clients (merchants and consumers) only see the information relevant to them, not the entire transaction data. This protects your privacy and minimizes data transferred, improving network efficiency.
Architectural Deep Dive – A Closer Look at its Key Components
Let’s delve into the major components that orchestrate this real-time data symphony.
TStore Client Bundle
This library serves as the bridge between applications and TStore. The capabilities and benefits provided by the client are as follows.
1. Abstracts away the complexities of data interaction with intuitive methods for writing, reading, and querying ensuring developers do not get bogged down in low-level details.
2. Interacts with discovery service and determines the relevant data center and routes requests accordingly ensuring serialized writes and maximized performance.
3. Streamlines secure and standardized mechanisms for JWT integration for API calls by leveraging PhonePe’s Authorization system and enforcing IAM rules on client applications.
Schema Registry
The Schema Registry component plays a critical role in managing and governing the evolution of entity schemas within TStore, which is vital for smooth functioning of all versions of various end-users / Apps.
Functionalities
Schema Cataloging
The Schema Registry catalogs multiple versions of transaction entity schemas, providing a centralized repository for managing schema definitions.
Data Validation
It acts as a gatekeeper, enforcing strict schema validation on all incoming data. This ensures adherence to defined structures, preventing inconsistencies and safeguarding data integrity. Both backward and forward compatibility are validated.
Seamless Schema Onboarding and Updates
The Schema Registry facilitates seamless schema evolution, allowing entities to adapt to changing requirements. Importantly, Avro’s capabilities enable on-the-fly onboarding of updated schemas without requiring code changes within TStore itself.
Why Avro?
TStore leverages Avro as its preferred serialization format for transaction entities. Extensive benchmarking against Thrift, JSON, BSON, and Protocol Buffers across various load scenarios revealed Avro’s clear dominance in the following key areas.
Entity Ingestor
The Entity Ingestor component is responsible for ingesting new entities and disseminating updates across systems.
- When a client interacts with TStore’s APIs to create an entity or send notifications, the Entity Ingestor first consumes the raw Avro bytes and writes to a Kafka topic as a write-ahead log (WAL). This ensures that in the event of an unexpected interruption, the change is captured and can be replayed later if needed.
- Once committed to WAL, the Entity Ingestor ingests the Avro bytes into HBase, which is the source of truth of the transaction data.
- TStore uses HBase, its primary storage layer, to build and update indices based on the ingested entity. These indices are created on key fields like user ID, payment status, and other relevant metadata. This optimized indexing structure enables fast and efficient scans when clients query the system for specific transactions.
- Only after both the WAL and HBase writes are confirmed does the Entity Ingestor proceed to produce notification details in dedicated Kafka topics. This crucial step ensures that notifications are only sent if the underlying data changes have been successfully persisted.
- Finally, the Entity Ingestor sends back an API response to the client, indicating the overall success or failure of the ingestion flow. This feedback loop keeps clients informed and facilitates error handling, resilience, reconciliation etc. if necessary.
Feed Service
Feed Service is the primary gateway for read-only access to transaction data (units). From powering interactive feeds on PhonePe to enabling granular insights through dashboards, it empowers the following diverse scenarios with fast data retrieval.
Functionalities
Change Propagation
One of its key functions is enabling the clients to poll for transaction status updates. After initiating a transaction, the feed service empowers the clients to periodically scan the status within HBase, keeping stakeholders, users and merchants informed of the latest developments. To efficiently track transaction updates, the Feed Service leverages the timed-order index tables (client managed) in HBase, which maintains user to unit id (inverted) mappings identifying relevant units to fetch during subsequent polling cycles. This targeted approach optimizes data retrieval, minimizing unnecessary scans and ensuring everyone receives timely updates.
Paginated Transaction History
Beyond individual status checks, the feed service powers the transaction history feeds within the PhonePe Consumer and Merchant apps via the bi-directional pagination, providing users and merchants a clear view of their past activities. This fosters transparency and helps users manage their finances effectively.
Dashboards and Insights
The feed service serves as the backend behind various dashboards, empowering detailed analysis and reporting. Its ability to leverage Elasticsearch’s indexes and perform filtering, aggregations etc. while retrieving data from the HBase for further processing. This enables sophisticated analytics tailored to specific merchant use cases.
In essence, separating the read and write paths fosters scalability, performance, availability, and security within TStore. Independent scaling of read and write operations allows TStore to handle high volumes of transactions efficiently optimizing their individual access patterns and resources (CPUs, Memory etc). The read heavy Feed Service can be scaled independently without impacting write performance of the Entity Ingestor.
Looking Ahead to Part 2, A Deeper Dive
Here in chapter 1, we discussed the high-level overview of TStore’s functionalities and core concepts. In chapter 2, we’ll delve deeper into its architectural intricacies. We’ll explore how TStore ensures data consistency through robust data replication strategies. We’ll also shed light on the specific data storage solutions that TStore utilizes for optimal performance and scalability. We’ll discover how it leverages geographically distributed data centers in an Active-Active infrastructure for continuous operation, even in the face of unforeseen disruptions. By the end of chapter 2, you’ll gain a comprehensive understanding of the workings that power TStore, the backbone of billions of transactions at PhonePe.