Mobile

Introducing the PhonePe App Platform

Gaurav Lochan13 September, 2024

URL copied to clipboard

Intro

PhonePe’s apps have over 560m registered users, and are built by 250+ mobile engineers. The company ships 7 different apps (10, if you count Android and iOS separately). Operating at this scale, it’s been necessary to invest in and build a solid app platform. This blog essentially attempts to deconstruct what we’ve built and share it broadly.

What is an App Platform?

In the simplest terms, it’s the infrastructure and common code that isn’t driven by any single product feature / business requirement. This includes the following:

System libraries / SDKs (e.g. Networking, Analytics, Config, Localization)
Performance and Reliability
Product / UI framework layers (e.g. React / Kotlin MultiPlatform)
Source code repo setup and Build scripts
CI / CD (release) systems
Dev Tools and Test automation
… and more

These could come from open source, external solutions, or built internally. It’s usually a mix of external and internal, and each company/app might balance these differently. The Facebook app leans more internal, and WhatsApp leaned more external (at least 5 years ago). PhonePe’s app platform is roughly 70% internal vs external.

Why form an App Platform team?

When a company starts building a new app with a small team, each engineer usually works on product functionality, and the platform work is simpler and divided across engineers.

As the app grows (both in terms of users and engineers), these aspects become more important and specialized – an app might need to support multiple product teams working on it independently, Android and iOS apps might share code, or multiple apps might need to share code (and learnings). At this point, these become complex enough that it helps to have a team with dedicated focus. I’ve seen different incarnations of this team across companies with names like “Mobile Infra”, “Client Foundation”, “App Core”, “Mobile DevX” and so on.

What drives PhonePe’s choices?

Each company operates with different goals and constraints, and here are a few things that factor in our decision making:

Scale – We have ~560m registered users, and do ~280m transactions/day.
Quality – People’s businesses depend on our apps to work reliability and quickly
Privacy – Given the nature of our products, user data is very sensitive.
Regulation – Need to comply with regulatory requirements (e.g. data localization).
Cost – Payments is a low-margin business and so we want to keep overheads low.
Velocity – Continue to build and release quickly

These constraints are sometimes at odds with each other, so you can understand why our job is cut out for us.

What does PhonePe’s App Platform look like?

We’ll cover parts of the platform with some basic context. We’ll be covering details of a few of these parts in future blog posts (and don’t hesitate to let us know which parts are more interesting than others):

Internal SDKs

Many of our 1st party SDKs were originally part of the payments app codebase, but have been extracted into standalone libraries that are shared by all the apps.
We have a SDK repository called PhonePe App Infrastructure (PAI). This is similar to maven but gives us additional metadata for each SDK and defines the developer workflow for updates.
Network – An internal SDK that provides a standard abstraction, security and advanced capabilities
Configuration – A library to support complex configuration and targeting at runtime.
We use foxtrot for analytics (with configurable sampling).
We have an on-device ML framework called Edge, which uses TFLite under the covers.
We have many other SDKs for capabilities like Localization, Task management, etc

We also use many open source SDKs (which are listed in the app’s “about page”)

Internal Product Frameworks

While the apps are mostly built with standard Android/iOS frameworks, we have some additional product frameworks:

Two systems for Server-driven UI – LiquidUI and WidgetX
Design systems for each app

Build & Release

We have an internal tool (Bazooka) to manage the release workflow and automate pushing the build to various stores. We’ve refined this over the last two years and are able to release lots of changes with pretty high predictability.
- The payments app ships every 2 weeks, with a ~3 day rollout
- The payments app ships a beta every day (on Android) and every week (on iOS)
- The other apps typically ship every week
We have built and integrated a “Build time recorder” to measure build times on dev machines and CI machines
For the payments app, feature teams can use a Sandbox/Micro app for their codebase, so that they don’t need to build the whole app each time. This brings build times down to less than a minute.

CI

We use Gitlab for source control and for triggering pipelines, with dockerized instances for each runner
Each merge/commit runs a series of Unit tests, Sonar checks, APK/IPA size checks, and UI automation
We built a “Central Pipeline Library” to share code across pipelines, and to provide some functionality out of the box (e.g. pipeline metrics).

QA & Automation systems

We use mostly Native test frameworks (Espresso on Android, XCUI on iOS)
- Some tests are written using Appium
We have an internal device lab using HeadSpin
We have an Emulator farm, built using OpenSTF
We use WireMock to simulate different request/response scenarios and edge cases
- This can be used by Automation, Developers, and QA
- We’ve built some Web UI to make it easier to do this
We have a backed environment called “Prod Mirror” which is an internal stable instance, to run automation against.
- We have backend simulators to mock various external systems
We used to run UI automation on devices using Gitlab runners, but we ran into limitations. We’ve built a “Test Execution System” that can run different test suites (e.g. beta, prod, or pod-specific tests), assign and lock devices (phones or emulators), start docker images, and process results.

Performance

We have an internal tool (Dash) that collects sampled performance metrics (latencies, resource usage, etc) from its client SDK, and visualizes results using Superset.
We track some key metrics for each app (L1 metrics) and monitor them to make sure these are not regressing, and identify ones to optimize (e.g. Chat performance)
We are building a system to find large performance regressions on client changes

What next?

Experimentation – While we already use some internal AB services for targeting and randomization, we’re currently building a first class experimentation system with automated analysis, to be used across the company (more on that when we get there)
We’re looking to move more tests to Emulators (and Simulators)
We’re finding ways to effectively leverage GenAi (who isn’t?!)