Engineering
Nimbus: Flexible BareMetal Provisioning
Tools (SRE) and Infra (SRE) team25 July, 2024
As PhonePe continues to grow rapidly, the scale and reliability of our infrastructure becomes more important. By developing our own infrastructure, we’re able to adapt quickly while keeping our services reliable.
In order to meet our deployment needs, we use specialized in-house tools to onboard servers from various manufacturers. This process involves more than just acquiring hardware; it’s about fine-tuning each server’s specifications to ensure they align with our unique requirements. From configuring the hardware to optimizing software settings, every step is carefully managed to ensure that our infrastructure is both robust and capable of supporting PhonePe’s evolving needs.
Background
The foundation of our infrastructure is what we call “BareMetals” – machines that are provisioned with a chosen operating system and equipped with everything needed for successful deployment. However, setting up these machines on a large scale comes with its own challenges. This involves the following tasks:
- Starting the machines over the network
- Installing the required operating system
- Configuring various settings, including network setup that involves dealing with either Clos network (fabric) or a more traditional legacy hierarchical network design
- Installing software packages, observability services, and applying security policies.
Ensuring everything works reliably with so many parts involved is a complex task as there can be faults at every level.
One thing to note is that this is a finely orchestrated procedure when done at scale across thousands of servers. In a typical scenario, the following happens when we want to provision a machine:
- The machine boots via the management network to communicate with a control server that has all the necessary details.
- This then involves booting into an OS installer, which gathers information needed for provisioning. This includes:
- Network configuration
- Required packages or programs
- Observability/metrics services
While our system is effective, it’s not without its challenges:
- Hardware issues that are unaccounted for could cause crashes.
- Having multiple dependencies of services for info gathering at various stages of provisioning is not ideal for debugging when any one of these stages fails.
- The system is best suited for a generalized set of machines that require similar setups.
To handle these challenges, we developed Nimbus. This helps us scale our infrastructure to meet our specific requirements while reducing the challenges associated with accommodating specialized setups.
Nimbus API (Metadata service)
In the traditional BareMetal OS installer process, the installer is configured to interact with a set of services to gather essential information like system configuration and required software packages. This is typically achieved through a metadata file, such as the auto-install file for Ubuntu-based systems, which is preloaded with the OS installer. While cloud-init files are usually set up to enable BareMetals to communicate with various services for configuration data, we’ve taken a step further by “pre-cooking” this configuration. This approach allows us to manage server communication on behalf of the BareMetals, enhancing security by limiting machine communication and enabling us to gracefully handle system errors.
At the heart of Nimbus Flow is the creation of a OTI (One-Time OS-Installer) file, such as auto-install for jammy and focal distributions. This file is crucial for the provisioning process, containing all the necessary data from the control server (PPEC API) to ensure a successful setup. This process is supported by chain loading Nimbus, a custom build of the iPXE network boot firmware with embedded .ipxe script which addresses and mitigates potential edge cases effectively. A custom build also allows us to enable/disable certain features of the network card based on the use case at a firmware level.
In essence, This is a sort of machine provisioning that leverages a precooked-metadata backed process, And we call it The Nimbus Flow.
This incorporates data captured by various tools developed by us including the ppec-agent (an agent that ensures the physical state of the machine is well and good), PIOUS (a custom operating system that checks for firmware & network sanity), and PPEC-API (the control server in our case, having all metadata about the machine to be provisioned), offering a comprehensive view throughout the server lifecycle.
The flow is designed to significantly enhance provisioning success rates while identifying potential issues in the process during the creation stage and to improve consistency and predictability. This early detection mechanism is essential in streamlining the provisioning process, ensuring smooth operations.
Core Components of Nimbus Flow
Nimbus Flow comprises three key components:
Nimbus:
A Custom Build of iPXE – This network boot firmware is foundational to the Nimbus Flow, enabling efficient and reliable booting processes.
Nimbus-API:
Serving as the backbone for iPXE and installer script generation, Nimbus-API also functions as a LogSink, collecting and managing logs crucial for troubleshooting and operational efficiency
Provisioning Agent:
This component is responsible for verification, validation, and fixing provisioning states, ensuring the integrity and reliability of the provisioned server.
How Nimbus Flow Works
Nimbus Flow begins with the BareMetal “create” intent, which triggers the Nimbus-API to generate an installer file containing all necessary provisioning details from the control server. Upon network boot, the DHCP server chain loads our custom Nimbus Network Boot Firmware, initiating the Nimbus boot process.
Nimbus then executes an embedded script that reads the SMBIOS table (SMBIOS specification defines data structures that can access management information produced by the BIOS.) and communicates with Nimbus-API, providing detailed data for further processing. The API validates these requests to eliminate duplicate or inconsistent chainloads and generates an OS-specific iPXE script based on the data provided.
Once the script is chain-loaded, control is transferred to the kernel and initrd, and the One-Time Installer file is requested from Nimbus-API. This file is pivotal in continuing the installation process, without further need for external dependencies. Post-reboot, the Provisioning Agent validates the system and network links, fixing and monitoring services as necessary and reporting any failure states back to Nimbus-API.
Flow Specifics
Currently, we support auto-install formats, with additional support for kickseed and preseed formats anticipated in future releases.
1. Setting up bootmode to be UEFI
We will need UEFI for a more customizable configuration setup, so the initial script will make sure that the server is first on UEFI.
Note: BMC and BIOS consistency is taken care of by SENZU
Identify boot mode and change to UEFI if on BIOS
Changing boot mode to UEFI
2. Identifying the server using SMBIOS Data
After the reboot another iPXE script is chain loaded that sends SMBIOS Data to Nimbus so it can chain load the next OS Specific iPXE script based on this data, this data validated and the required OS Specific script is chain loaded from Nimbus API.
Chain OS specific iPXE script.
Nimbus-API response for chain loading iPXE script for Jammy, This will load the kernel, initrd and set kernel args where we specify url to fetch auto-install.
Response template for auto-install data, here we set up reporting, provisioning agent, user data and other specifics based on the use case that was defined for this specific provisioning request.
Summary
The Nimbus flow simplifies provisioning of BareMetals while offering enhanced monitoring, reliability, and efficiency. By leveraging a precooked-metadata backed process, Nimbus Flow streamlines server setup, reducing dependencies and potential points of failure.
Nimbus, Nimbus-API, and the Provisioning Agent ensure a seamless and controlled provisioning process, from initial boot to final validation. This approach not only improves the consistency and predictability of deployments but also allows for early detection of issues, minimizing disruptions and ensuring a smooth operational flow.
With Nimbus Flow, we are better equipped to meet the demands of our growing infrastructure, ensuring PhonePe’s services remain reliable and performant.
Authors: Nandan Herekar, Raj Sharma, Vishnu Naini, Surya Murugan