How to prevent load balancers from being a single point of failure
Today we are going to talk about load balancers. A lot has already been written on this topic but today we are not going to be talking about on how load balancers work but instead on how to prevent a load balancer being a single point of failure
Load balancers are fantastic tools in your system design toolset to allow you systems to scale. When you need to serve a lot of traffic, you will inevitably reach a point where a single server will run out of memory and computing power to serve all the traffic. When this happens you have the following two options
- Make your server more powerful but increasing processing power and memory but there a hard upper limit to it
- Or you can run multiple instances of your server, with each instance serving a portion of the traffic.
Load balancers allow you to scale by implementing the second approach, in simple words a load balancer will take all your incoming traffic and intelligently distribute that traffic to the several running instances of your server. This allows you to scale infinitely but simply add more and more replicas of your server. This is called Horizontal Scaling in the system design lingo.
While load balancers allow you to circumvent the memory and compute limitations when you are scaling your applications, it does not solve the single point of failure problem. Consider the following diagram
Here you can see that all the requests go through the load balancer when then forwards these requests to the backend replicas. It is clear that in this case a load balancer is a single point of failure. If the load balancer shuts down or is unreachable for whatever reason, your users will not be able to use the application and you will have a complete outage.
So how can we prevent such a situation, how to we prevent our load balancer from being a single point of failure. We are going to discuss on how can we solve this problem.
Deploy load balancers in high availability pairs
One solution to the problem is that we deploy the load balancers in high availability pairs. That means we deploy two are more instances of our load balancers such that in case one of the load balancer goes down, the other load balancers are able to pick up and continue serving our clients.
This sounds like a reasonable solution however the first question that immediately comes to mind is how will our clients know, to which instance of the load balancer they need to make a request? How will the other load balancers know if a particular instance of a load balancer has gone down.
Lets address these questions.
How traffic is managed in a local network
Following points about network routing are important to understand, in order to answer the above questions. We need to go a bit deeper and understand how traffic is routed within a network.
- All traffic within a local network in managed by a device called Switch instead of a Router.
- A switch uses the MAC addresses of host machines on a network to send traffic to them.
- Whenever a Switch receives a TCP packet with a destination IP address, it needs to know the corresponding MAC address of the actual host in order to correctly forward the TCP packet to the correct machine.
- In order to resolve the MAC address for the destination IP address, the Switch will use the Address Resolution Protocol (ARP), in which it will send a broadcast to all machines connected to the local network, asking them if they are serving the given IP.
- The host which has the given IP will respond back with its MAC address while the other hosts will simply ignore the ARP request.
Now that we understand how network routing works, let return back to our original problem.
The first step is to deploy two load balancers in a network. One load balancer will be active, while the other will be in passive mode. This active-passive pattern is quite common in high availability system.
Both of these load balancers agree to share the same virtual ip address, (a virtual IP address is assigned to software system e.g load balancer instead of a physical device, for our purposes you can consider it as a fictitious IP address). The also send heartbeats (liveness ping) to each other.
Whenever a switch wants to forward a packet to the virtual ip address, it first sends a broadcast to all hosts on the network asking which machines MAC address has the virtual IP address assigned to them. When the active (primary) load balancer receives this broadcast it immediately responds back with its MAC address, that way the switch know that all future packets belonging to the virtual ip address should be forwarded to our active load balancer. The passive load balancer also receives the broadcast however it does not respond with its MAC address because it is aware that it is not a primary load balancer.
Both load balancers constantly send heartbeat (liveness ping) messages to each other. In case the active load balancer fails, the passive load balancer immediately starts advertising its own MAC address for the virtual IP address, thus assuming the role of the active load balancer.
This whole process of resolving IP address to MAC address is done using the Address Resolution Protocol (or ARP for short).
Note that the process of resolving MAC address for IP address using ARP is quite fast, so end users might not see a visible down time when the primary (active) load balancer fails.
And this is how we ensure that our load balancers dont become a single point of failure. Note that there are other strategies as well that we can use to solve this problem, such as using Anycast, or monitoring the load balancers and automating the process of updating the DNS to point to the healthy load balancer, etc., but that’s a topic for another blog post.
PS: If you find any technical mistakes in this article, please do let me know I would be I happy to learn more and correct myself