* for client-side load balancing, it's entirely possible to move active healthchecking into a dedicated service and have its results be vended along with discovery. In fact, more managed server-side load balancers are also moving healthchecking out of band so they can scale the forwarding plane independently of probes.
* for server-side load balancing, it's entirely possible to shard forwarders to avoid SPOFs, typically by creating isolated increments and then using shuffle sharding by caller/callee to minimize overlap between workloads. I think Alibaba's canalmesh whitepaper covers such an approach.
As for scale, I think for almost everybody it's completely overblown to go with a p2p model. I think a reasonable estimate for a centralized proxy fleet is about 1% of infrastructure costs. If you want to save that, you need to have a team that can build/maintain your centralized proxy's capabilities in all the languages/frameworks your company uses, and you likely need to be build the proxy anyways for the long-tail. Whereas you can fund a much smaller team to focus on e2e ownership of your forwarding plane.
Add on top that you need a safe deployment strategy for updating the critical logic in all of these combinations, and continuous deployment to ensure your fixes roll out to the fleet in a timely fashion. This is itself a hard scaling problem.
I've never quite understood why there couldn't be a standardised "reverse" HTTP connection, from server to load balancer, over which connections are balanced. Standardised so that some kind of health signalling could be present for easy/safe draining of connections.
It seems like passive is the best option here but can someone explain why one real request must fail? So the load balancer is monitoring for failed requests. If it receives one can it not forward the initial request again?
Not every request is idempotent and its not known when or why a request has failed. GETs are ok (in theory) but you can't retry a POST without risk of side effects.
For GET /, sure, and some mature load balancers can do this. For POST /upload_video, no. You'd have to store all in-flight requests, either in-memory or on disk, in case you need to replay the entire thing with a different backend. Not a very good tradeoff.
I wrote this after seeing cases where instances were technically “up” but clearly not serving traffic correctly.
The article explores how client-side and server-side load balancing differ in failure detection speed, consistency, and operational complexity.
I’d love input from people who’ve operated service meshes, Envoy/HAProxy setups, or large distributed fleets — particularly around edge cases and scaling tradeoffs.
Thanks for writing something that's accessible to someone who's only used Nginx server-side load balancing and didn't know client-side load balancing existed at higher scale.
kind of right, kind of wrong
* for client-side load balancing, it's entirely possible to move active healthchecking into a dedicated service and have its results be vended along with discovery. In fact, more managed server-side load balancers are also moving healthchecking out of band so they can scale the forwarding plane independently of probes.
* for server-side load balancing, it's entirely possible to shard forwarders to avoid SPOFs, typically by creating isolated increments and then using shuffle sharding by caller/callee to minimize overlap between workloads. I think Alibaba's canalmesh whitepaper covers such an approach.
As for scale, I think for almost everybody it's completely overblown to go with a p2p model. I think a reasonable estimate for a centralized proxy fleet is about 1% of infrastructure costs. If you want to save that, you need to have a team that can build/maintain your centralized proxy's capabilities in all the languages/frameworks your company uses, and you likely need to be build the proxy anyways for the long-tail. Whereas you can fund a much smaller team to focus on e2e ownership of your forwarding plane.
Add on top that you need a safe deployment strategy for updating the critical logic in all of these combinations, and continuous deployment to ensure your fixes roll out to the fleet in a timely fashion. This is itself a hard scaling problem.
I've never quite understood why there couldn't be a standardised "reverse" HTTP connection, from server to load balancer, over which connections are balanced. Standardised so that some kind of health signalling could be present for easy/safe draining of connections.
It seems like passive is the best option here but can someone explain why one real request must fail? So the load balancer is monitoring for failed requests. If it receives one can it not forward the initial request again?
Not every request is idempotent and its not known when or why a request has failed. GETs are ok (in theory) but you can't retry a POST without risk of side effects.
For GET /, sure, and some mature load balancers can do this. For POST /upload_video, no. You'd have to store all in-flight requests, either in-memory or on disk, in case you need to replay the entire thing with a different backend. Not a very good tradeoff.
I wrote this after seeing cases where instances were technically “up” but clearly not serving traffic correctly.
The article explores how client-side and server-side load balancing differ in failure detection speed, consistency, and operational complexity.
I’d love input from people who’ve operated service meshes, Envoy/HAProxy setups, or large distributed fleets — particularly around edge cases and scaling tradeoffs.
Hi author, a tangent:
For us who need to zoom in on mobile devices.Thanks for writing something that's accessible to someone who's only used Nginx server-side load balancing and didn't know client-side load balancing existed at higher scale.