Architecture

PubSub at Scale: PhonePe’s “Bullhorn”

Prateek Grover, Software Engineer, Consumer Payments02 July, 2025

URL copied to clipboard

Problem Statement

PhonePe, as a payments platform, has an essential feature that allows users to check all past transactions. When payments are made to businesses/merchants, users have access to a historical view of all such transactions. For peer to peer transactions, however, users usually want to see a timeline of all transactions for a particular user. A chat-like interface works best in such instances because along with transactions, a chat interface opens up room for a lot more interactions between users.

This requirement led to the development of Bullhorn at PhonePe, a PubSub platform, which powers these interactions. 

In this blog, we delve into the architecture, design decisions, and various use cases of Bullhorn that make it an indispensable part of the PhonePe ecosystem.

Solution

When building a chat-like system that facilitates interaction between individuals, the first consideration is the creation of a shared space that is known to all participants. This shared space should be accessible only to the intended group, to keep the conversation private. The people involved are then subscribed to the shared space and receive messages. Therefore, to solve the above problem, we needed a system where one can create a shared space and read and post messages to it. It should also support paginated reading of messages, allowing users to view both the latest and older messages as they scroll through past dates.

Solving for a system with such capabilities opened up new avenues. One such use-case was the PhonePe Inbox or Alerts that send promotional messages within the app, targeted towards a single user, group, or all users, in a persistent manner. This method provided a more reliable alternative to traditional push notifications, which can be easily dismissed and often suffer from unreliable delivery. The messages could be targeted to a single user, multiple users or all the users. Delivery rate for push notifications is low and we needed persistence for those messages. With the above design in mind with topics, messages and subscriptions, it was possible to superimpose this requirement on Bullhorn’s existing model and it worked seamlessly.

Bullhorn was thus conceived as a pubsub platform where users can create topics, subscribe to them, post and read messages from these topics.

Terminology

The key terms for the solution detailed above are as follows: 

  • Message – the unit of information being transmitted
  • Topic – a shared space that has information about participants and contains messages
  • Subscription – the entity that governs access to a topic

Use cases

With the flexibility that Bullhorn’s model provided, it was able to support more use cases besides PhonePe Chat and Inbox. We are also constantly finding more areas where this model can fit and provide a ready-made solution.

App Instruction

One of the most important use cases of Bullhorn is App Instructions. This is the system that provides an ability to send system-originated instructional messages to the user’s devices, to be executed on the device. Few of the examples where this is helpful are:

  • Updating cached data on device
  • Getting response for an asynchronous request
  • Refreshing state of the application

In all these cases, users’ devices only poll on App Instructions and execute the task instead of following the traditional approach of polling multiple systems every time. In case of updating cached data on application, the task would be to sync the latest changes in the cached data from the origin system.

For perspective, when the sync of messages on the Chat Roster page was moved to App Instructions to sync only those topics that have messages, we reduced 35% of the total calls (from 230K to 150K).

Inbox/Alerts

The “Ghanti” or Inbox/Alerts screen on PhonePe shows the latest promotional and transactional notifications to the user. Although these messages are sent via Push notifications, the delivery of these messages are very low and they are not persistent, whereas with the polling based system of Bullhorn, both these problems are solved. We’ll elaborate on this later in the blog.

PhonePe Chat and Customer Support Chat

Bullhorn also supports the Customer Support chat feature along with PhonePe chat since both these systems have similar high level semantics and requirements.

Pincode Feed

Bullhorn powers the home page feed for Pincode, which is a hyperlocal ecommerce app from PhonePe. The Pincode App’s homepage contains several dynamic widgets customized to the user’s profile and location. And a system like Bullhorn allows the page to be fresh and contextual, through a feed of messages from a mixture of bullhorn topics. 

Architecture and Design decisions

Bullhorn uses HBase as its underlying data store, primarily because of its scalability and lexicographically sorted keys which helps in performing efficient range queries and high write and read throughput.

Choice of Database

PhonePe currently processes millions of P2P transactions per day. All of these transactions require one write for pending transaction state and one write for terminal transaction state since it can take time for the transaction to go from pending to terminal state. Apart from this, as in the other problem statement, there were messages sent by systems to users which required additional writes. Users would also make read calls to fetch these messages.

Considering these metrics and proposed solution, we needed a data store that could support the following features:

  1. Range queries – Given a pointer, provide messages after or before
  2. Read and Write throughput – Should support high read and write throughput
  3. Get message – Given an ID, provide the message for it
  4. Flexible Schema – Given that the first use case was a chat feature within which we had multiple use cases like Text cards, Payment Cards, etc. It was important for us to have a flexible schema for this use case.

These, along with the knowledge of setting up HBase clusters previously in the organization, HBase seemed to be a good choice for the data store for Bullhorn.

Row keys

In a data store like HBase, row keys are very important. They define how the data will be stored on disk in different regions. The chosen row key needs to be as random as possible. This is because HBase being a lexicographically sorted database, chooses region servers according to the row keys. If the row keys are not random enough, it might so happen that some regions in HBase store a lot of keys and hence are taking most of the traffic. Therefore, we add a 2 byte or 1 byte hash of some components like topic ID and message ID as a prefix to the row key so that there is equitable distribution of data in all regions. More on row keys and its importance in the HBase book.

Types of topics

Although built for chat use cases, Bullhorn quickly evolved to support another use case which is manifested as the Inbox/Alerts screen on PhonePe app, a brief introduction for which was given above. This screen contains promotional as well as transactional messages for the user. These messages are generated by our services in the Growth and Payments ecosystem and are intended either for one or more users or the whole user base.

When the message is intended for a small number of users, a topic can be created for each of the users for the Inbox screen and publish messages to that topic. On the other hand, if the message is intended for a big subset of the user base, it becomes a tricky problem to solve.

The first thought would be to write messages to each user’s topic but a strategy like this would generate massive amounts of messages, and a lot of load on our data store. We came up with another solution by changing the user-topic relationship in the system’s design. Instead of having just one topic for each user, we have two – one user-specific topic and another common topic for all users or a subset of a sufficiently large number of users. Whenever a user fetches messages, it fetches it from both the topics and shows it on the screen. There are, however, certain restrictions on this common topic. For instance, one can’t fetch all subscribers for this topic because then it will create a huge read traffic on the data store.

Another problem with common topics is in the read path. When users open their app, they initiate a sync for messages with Bullhorn for common topics. If a considerable number of these users open the app simultaneously which is a real possibility for a huge user base like ours, one region where the latest message is stored can become a bottleneck and our reads will suffer (the typical database hotspot). We handled this by duplicating the messages across all regions of the HBase table for these common topics. During read operations, we randomly redirect the user to a particular region (by choosing a random prefix) and fetch the message from there. It is important to observe that the design choice allowed us to reduce the number of messages from several millions to 1. So the cost of duplicating these messages across all regions is minimal.

With this, we have reduced the number of writes to the number of regions instead of the big subset of the user base, and preserved the read performance.

Change propagation via Audits

Every message and topic operation is accompanied by an audit. The audits are the primary components used to propagate changes to clients. Whenever a message or a topic operation happens, an audit object is created – pointing to the actual message/topic. Whenever a user tries to sync latest changes, it provides a pointer for these audits, and Bullhorn fetches the audits after/before the pointer, fetches the actual message/topic object and serves it to the client. The underlying implementation for retrieving audits uses a range scan on HBase.

Delivery Mechanism

Bullhorn is primarily a pull-based system, which means given a published message, instead of Bullhorn performing some kind of push to subscribers, subscribers pull from Bullhorn. This is true even for the chat feature of PhonePe where the poll happens in an exponential back off manner and resets when there is a new message.

Bullhorn does not support websockets delivery mechanism primarily because with a huge active user base, maintaining millions of concurrent persistent connections is a big overhead in terms of infra cost. In addition to this, most of the use cases are not real time in nature and hence polling works. Needless to say, we have identified certain use cases where real time communication is beneficial and have started to roll out websockets communication.

Conclusion

Bullhorn has proved to be an important piece of software in the PhonePe infrastructure. Its simple and flexible model along with scalability of HBase has enabled the PhonePe app to be optimal in making network calls, provided pleasant experience in the form of chat and solved consistency issues between server and app with little overhead.

At PhonePe, we are committed to providing excellent experiences to our customers in the safest and most transparent way by using technology to its full potential.
We hope that the journey of Bullhorn, its design and architecture inspires and informs the broader engineering community. If you’re excited by the problems we’re solving and the technology we’re building, we invite you to join us on this journey—explore career opportunities at PhonePe Careers.