Architecture

Technology Catch-22

Rahul Chari15 April, 2020

URL copied to clipboard

Engineering the product on Day 1 to handle hyper-growth in the future

Every entrepreneur in the technology space, starts his or her journey with the dream of building a product or providing a solution that addresses a strong need, or solves a fairly pervasive problem using technology as the enabler. Technology today allows us to not only provide an elegant solution to a problem, but also to address it at massive scale and at a fraction of the cost, than what the means and methods in the past have demanded. And nowhere is this more true than in the consumer internet sector, where it has played out multiple times with the often quoted examples of AirBnB as the largest virtual hotel company, or Uber as the largest virtual taxi fleet.

However, the trap that most enterprising teams fall into is what I call the technology catch-22 – hitting the inflection point of hypergrowth with a product engineered to be a test-the-waters proof of concept. And from thereon, the catch-up game on tech-debt while keeping up with the business becomes all encompassing on the engineering team. Open source technology stacks, low-cost public clouds and SaaS solutions for various peripheral services have meant that the barrier to entry for consumer internet businesses don’t exist anymore. This fuels competition which is a great thing but it also drives a sense of urgency that tilts the balance in favor of the approach of faster to market vis-à-vis truly ready to market. At that point, the justification that most technology start-ups come up with is that they are following the lean startup methodology with an ‘iterative’ model of development – the now famous Build-Measure-Learn loop.

The cycle of Build-Measure-Learn is a great framework to ensure that teams are as much focused on understanding consumer preferences & market realities and analyzing data as they are about building the product that they believe will bring about a paradigm shift. However, it is extremely important to be nuanced about where you should apply this framework and where you should take a long term view from day one of your startup. Every entrepreneur in the consumer internet space dreams of hitting that hockey stick growth where your users are doubling week over week or the visits to your app or website are exploding or your transaction volumes are surging. And the optimism ( rightfully so) is that the product will hit the right note from day one. If that is what you are aiming for, then why not engineer the product to meet the demands of hyper growth from day one. The answer varies from over-engineering is expensive and time-consuming to not being sure about the pivots the product/platform may take before finding the right market fit.

The balance lies in taking a more enterprise approach to building some parts of your technology stack vis-a-vis parts that can be experimental/proof-of-concepts/MVP’s (minimum viable products). So the framework I have for this problem is to Identify core platforms in our ecosystem, Ring Fence the design and development of these core platforms and ensure that we have a set of Design Principles in place to govern the evolution of these core platforms.

Identification

Identification of what are the core systems in your world is the first step towards being nuanced about your software design. The core platforms should always be built to scale for at least your 18-month projection if not more. The definition of core platforms are the subsystems that power the rest of your engineering. Some of the characteristics of core platforms are:

They are infrastructural in nature and power the application development across the organization for various features. Eg. the payment orchestrator, the Risk and Fraud engine, the promotions engine etc. They could also be the PaaS layer in your overall technology stack Eg. the transactional data stores, transient data stores, compute clusters , service discovery libraries etc.
They power primitive workflows and are generally not dependent on other subsystems to complete these operations. This basically means that they are the terminal nodes in a workflow
They implement important support functionality such as log collection, visualization on the log data, anomaly detection platforms requiring data aggregation and data processing from across services in the ecosystem
They serve high-throughput, low-latency requests from the front-end client that are designed to spike with higher user activity, such as real-time personalization engine for the mobile app

Ring Fencing

Once the core platforms have been identified, the thinking you must set in place is that these systems should never be subject to growth-hacking or indiscriminate reactionary changes to support short-term initiatives. Some of the good practices to follow are the following:

Have a small team of engineers own the core systems over a long period of time so that they build in-depth knowledge and apply learnings from experiences of scale events.
Ensure that every change is put through the lens of whether it brings a generic capability to the core systems and how it impacts the current performance benchmark. This will help answer the question of whether the change should be part of the core system or handled as a verticalized solution in a peripheral service
Ensure that the Software Development Life Cycle (SDLC) practices are never compromised in the teams owning the core systems. Poor performance or functional failures in these services can bring down the entire product

Over time, the entire organization across various functions starts to understand the nomenclature of core systems and respects the fact that change in core systems is critical and time consuming. The teams pivot to growth hacking for immediate gains on the peripheral services while focussing on new capability building on core systems. This leads to maximizing value from your core systems through scale and future provisioning on capabilities that otherwise tend to become an afterthought.

Design principles

It is important to anchor the design choices for core platforms as early as possible so that there is consistency in the approach to technology selection, design and development of these critical sub-systems. Some of these design choices may prove to be wrong and will be changed. However, not having them to start with will mean that the evolution of the core system is ad-hoc leading to a fairly unstable and/or vulnerable base for the product.

Some of the key design choices we have followed at PhonePe for our core systems are enforcing a shared-nothing architecture for all key services, ensuring that the databases are sharded to support high read-write throughput on large data sets and choosing an asynchronous data exchange model for all service calls. These coupled with a concerted effort to maintain a fairly homogeneous technology stack ensures that the core systems are designed to scale for future demands and not just for the here and now.

Depending on the life stage of a company, the above framework may not be always applicable. There are many reasons why taking this approach may not be viable – limited funding or funding that is contingent on meeting short term targets through a POC, inexperience of the early team or evolving clarity on the core value proposition. However, I do believe that there is a life-stage at which thinking along these lines becomes important and urgent to ensure that the technology platform within the company is ahead of the business demands. As a technology organization, that is the enviable position you want to get into!

Keep Reading

Architecture

PubSub at Scale: PhonePe’s “Bullhorn”

Prateek Grover, Software Engineer, Consumer PaymentsJuly 02, 2025

Architecture

Demystifying TStore: The Backbone of Billions of Transactions at PhonePe – Chapter 2

Arnab Bir and Tushar NaikAugust 08, 2024

Architecture

Demystifying TStore: The Backbone of Billions of Transactions at PhonePe

Arnab Bir and Tushar NaikAugust 06, 2024