Background / Problem Statement:
Being a Fin-Tech organization with payments as the core offering, at PhonePe we focus on securing our network perimeter with ultra-tight security measures. However, traditional approaches of implementing / securing the edge, like the one shown below (fig. 1) has a number of limitations.
Some of the limitations of traditional design:
- Firewalls, in order to achieve redundancy, are configured in cluster mode. In such cases, version upgrade / firmware upgrade requires a downtime.
- Stateful firewalls have upper limits on Total Active Sessions and New Sessions per second. This limit decides the total RPS that your edge / public facing services can deliver.
- Only easy way to increase the edge capacity is to upgrade the edge firewall hardware (which ironically, is not so easy to do in case of live systems)
- Firewall Cluster is the single point of failure
- Typically, both the firewalls need to be exactly the same make/model and the firmware version running on the same needs to be the same.
The architecture that we use not only addresses the above listed limitations, it also, adds additional flexibility and makes the setup highly redundant and linearly scalable.
Without further ado, let us get to it.
Solution that we implemented: Topology
- Across the fabric and edge, we use BGP as a routing protocol. It gives us a lot of flexibility in terms of having redundant paths, auto-failover, quick convergence.
- ISP Links are terminated at edge router and BGP session is established between ISP and Edge router.
- ECMP / Multipath BGP is enabled and both the edge routers advertise the public prefix and accept default from ISP.
Things to Note:
- Each firewall is independent and has its own BGP peering with Edge Router
- Fabric is connected to all the firewalls and has a BGP peering session with each of the firewall
- All BGP sessions are e-BGP with multi-path option enabled for ECMP capabilities.
- As each of the firewall is an independent node, we get the below benefits:
- No need to form firewall clusters
- Each firewall has its own capacity and total capacity of the network is a sum of capacities of each of the firewalls, making this a linearly scalable system
- As there is no cluster, there is no requirement of homogeneity in terms of Firewall make / model / firmware version. If all of them can talk BGP, you can mix and match
- This opens the capability of running virtual firewalls like Juniper vSRX on KVM and easily augment more capacity if needed.
Challenges / Issues faced:
- Session breakage due to asymmetric routing in firewall
- With multiple firewalls acting in active mode and ECMP in place, there is all possibility that asymmetric routing takes place (Forward path and Reverse path for the packet is not the same).
- There are multiple ways to address this, some of them are,
- performing SNAT+DNAT at firewall so that forward and reply path is deterministic.
- Use of PBR (policy / source-based routing) at the network fabric layer to route the reply packet back to correct firewall
Next step towards more stability and redundancy is to have multiple different ISP upstream and use BGP along with CDN`s load balancing capabilities to deterministically shape and route client traffic to achieve maximum efficiency.