OpenStack Neutron LBaaS – ensuring high availability for your apps

openstackNeutron is the Network as a Service (NaaS) layer of OpenStack. Part of Neutron is the LBaaS (Load Balancer as a Server) plugin which offers an abstraction layer that handles communication with load balancers. It is possible to configure the LBaaS modular with different drivers of load balancers.

The LBaaS agent normally runs on the same host as the L3 agent (the network host). This host can be seen as the gateway to your Cloud. The network must ensure high availability for production systems because if the host breaks none of the running instances on the Cloud are reachable. This scenario is a level 1 incident and each administrator or system architect must try to eliminate such SPOF services to guarantee maximum accessibility for your Cloud.

Introducing a LBaaS workaround

LBaaS doesn’t have an automated, integrated failover for all load balancer driver modules. Since the Havana release OpenStack supports a HA functionality for some hardware load balancer drivers like Radware. The most common and default load balancer is HAProxy. This layer 7 load balancer doesn’t support the virtual router redundancy protocol (vrrp) like keepalived out of the box and that’s the reason why the LBaaS driver module doesn’t support it either.

The core team of neutron and LBaaS is working on a HA LBaaS Agent with vrrp support for most software load balancer plugins, but this feature is unfortunately not ready yet.

Another approach with minor development overhead is to change the responsible LBaaS – Agent of the LB instances.

We will describe a basic workaround to make your LBaaS highly available in the mean time. It’s the same procedure as the new HA implementation of Neutron. This workaround will change the LB instance to another yet running LBaaS agent with a pacemaker and a shared storage for config files as there isn’t any function to create a new LB instance to an explicit LBaaS agent.

As the OpenStack setup has two networking servers to guarantee high availability two LBaaS agents should be installed, one per server. On each networking server a shared storage volume to /var/lib/neutron/lbaas/ must be attached. This directory with all configs must be available on both servers because the LBaaS agent will start all allocated LB instances with the stored and shared configs.

How does failover work?

We will show examples of performing API calls with OpenStack’s CLI tools, to describe how the failover works. The LBaaS agent should run on only one network server, but it is possible to run it twice. If one host breaks the pacemaker controls resources like failover of l3 agent and lbaas. These two resources must be implemented for a workable failover.

l3 agent failover

It is easy to make a failover of all running router and attached ports. You will only need to detach from the failed node:

Get all failed routers

neutron router-list-on-l3-agent

You can work with the following two methods to get the agent id. Firstly you can deposit the id on each server pacemaker config file. The second and my preferred method is to read the hostname of the “good” server and get the agent id of the “good” and “bad” server via a neutron API call.

neutron agent-list | grep L3

Now you have all routers on the failed L3 agent and you can perform your failover with the pacemaker, detaching and attaching all routers to the running l3 agent.

This operation need to be done before the LBaaS failover starts (co-locate and order your pacemaker resources).

         Note! This function is implemented since the Juno release but it’s not production ready yet. It will be stable in the Kilo release.

LBaaS agent failover

After the following the l3 agent failover method above you can perform the LBaaS failover but not via the API, because there aren’t any functions present. We will need to change the db entries. There are multiple ways to do so, for instance using db credentials. All information is present on each server and agent config files. The best choice is to get all available and needed information with the OpenStack Oslo Python library from the Neutron config files.

If you have a working db connection to your Neutron database you will need to update some entries in the ‘poolloadbalanceragentbindings’ table of your Neutron database.
You will need to change the agent id of all rows from the “bad one” to the “good one”.

For example:

bad agent id: c784b7fb-8094-4d3b-a8b1-804d90a80784
good agent id: 7e7700a3-02b2-4bd3-9c45-eca938c3f975

update poolloadbalanceragentbindings set agent_id=’7e7700a3-02b2-4bd3-9c45-eca938c3f975’ where agent_id=’c784b7fb-8094-4d3b-a8b1-804d90a80784’;

You can use the API call example of L3 agent failover to get both agent ID’s.
When this is done you can now re-/start the LBaaS agent on the good server and all LB instances will be spawned with all VIP ports. Just remember the L3 agent must switch all routers to the running agent before you start the LBaaS agent.

Leave a Reply

Your email address will not be published.