HA for Management Server

The CloudStack Management Server should be deployed in a multi-node configuration such that it is not susceptible to individual server failures. The Management Server itself (as distinct from the MySQL database) is stateless and may be placed behind a load balancer.

Normal operation of Hosts is not impacted by an outage of all Management Serves. All guest VMs will continue to work.

When the Management Server is down, no new VMs can be created, and the end user and admin UI, API, dynamic load distribution, and HA will cease to work.

Management Server Load Balancing

CloudStack can use a load balancer to provide a virtual IP for multiple Management Servers. The administrator is responsible for creating the load balancer rules for the Management Servers. The application requires persistence or stickiness across multiple sessions. The following chart lists the ports that should be load balanced and whether or not persistence is required.

Even if persistence is not required, enabling it is permitted.

Source Port Destination Port Protocol Persistence Required?
80 or 443 8080 (or 20400 with AJP) HTTP (or AJP) Yes
8250 8250 TCP Yes
8096 8096 HTTP No

In addition to above settings, the administrator is responsible for setting the ‘host’ global config value from the management server IP to load balancer virtual IP address. If the ‘host’ value is not set to the VIP for Port 8250 and one of your management servers crashes, the UI is still available but the system VMs will not be able to contact the management server.

Multiple Management Servers Support on agents

In a Cloudstack environment with multiple management servers, an agent can be configured, based on an algorithm, to which management server to connect to. This can be useful as an internal loadbalancer or for high availability. An administrator is responsible for setting the list of management servers and choosing a sorting algorithm using global settings. The management server is responsible for propagating the settings to the connected agents (running inside of the Secondary Storage Virtual Machine, Console Proxy Virtual Machine or the KVM hosts).

The three global settings that need to be configured are the following:

  • hosts: a comma seperated list of management server IP addresses
  • indirect.agent.lb.algorithm: The algorithm for the indirect agent LB
  • indirect.agent.lb.check.interval: The preferred host check interval for the agent’s background task that checks and switches to an agent’s preferred host.

These settings can be configured from the global settings page in the UI or using the updateConfiguration API call.

The indirect.agent.lb.algorithm setting supports following algorithm options:

  • static: Use the list of management server IP addresses as provided.
  • roundrobin: Evenly spread hosts across management servers, based on the host’s id.
  • shuffle: Pseudo Randomly sort the list (this is not recommended for production).

Note

The ‘static’ and ‘roundrobin’ algorithms, strictly checks for the order as expected by them, however, the ‘shuffle’ algorithm just checks for content and not the order of the comma separate management server host addresses.

Any changes to the global settings - indirect.agent.lb.algorithm and host does not require restarting of the management server(s) and the agents. A change in these global settings will be propagated to all connected agents.

The comma-separated management server list is propagated to agents in following cases: - An addition of an agent (including ssvm, cpvm system VMs). - Connection or reconnection of an agent to a management server. - After an administrator changes the ‘host’ and/or the ‘indirect.agent.lb.algorithm’ global settings.

On the agent side, the ‘host’ setting is saved in its properties file as: host=<comma separated addresses>@<algorithm name>.

From the agent’s perspective, the first address in the propagated list will be considered the preferred host. A new background task can be activated by configuring the indirect.agent.lb.check.interval which is a cluster level global setting from CloudStack and administrators can also override this by configuring the ‘host.lb.check.interval’ in the agent.properties file.

When an agent gets a host and algorithm combination, the host specific background check interval is also sent and is dynamically reconfigured in the background task without need to restart agents.

To make things more clear, consider this example: Suppose an environment which has 3 management servers: A, B and C and 3 KVM agents.

Setting ‘host’ = ‘A,B,C’, agents will receive lists depending on ‘direct.agent.lb’ value:

‘static’: Each agent will receive the list: ‘A,B,C’ ‘roundrobin’: First agent receives: ‘A,B,C’, second agent receives: ‘B,C,A’, third agent receives: ‘C,B,A’ ‘shuffle’: Each agent will receive a list in random order.

HA-Enabled Virtual Machines

The user can specify a virtual machine as HA-enabled. By default, all virtual router VMs and Elastic Load Balancing VMs are automatically configured as HA-enabled. When an HA-enabled VM crashes, CloudStack detects the crash and restarts the VM automatically within the same Availability Zone. HA is never performed across different Availability Zones. CloudStack has a conservative policy towards restarting VMs and ensures that there will never be two instances of the same VM running at the same time. The Management Server attempts to start the VM on another Host in the same cluster.

HA features work with iSCSI or NFS primary storage. HA with local storage is not supported.

Dedicated HA Hosts

One or more hosts can be designated for use only by HA-enabled VMs that are restarting due to a host failure. Setting up a pool of such dedicated HA hosts as the recovery destination for all HA-enabled VMs is useful to:

  • Make it easier to determine which VMs have been restarted as part of the CloudStack high-availability function. If a VM is running on a dedicated HA host, then it must be an HA-enabled VM whose original host failed. (With one exception: It is possible for an administrator to manually migrate any VM to a dedicated HA host.).
  • Keep HA-enabled VMs from restarting on hosts which may be reserved for other purposes.

The dedicated HA option is set through a special host tag when the host is created. To allow the administrator to dedicate hosts to only HA-enabled VMs, set the global configuration variable ha.tag to the desired tag (for example, “ha_host”), and restart the Management Server. Enter the value in the Host Tags field when adding the host(s) that you want to dedicate to HA-enabled VMs.

Note

If you set ha.tag, be sure to actually use that tag on at least one host in your cloud. If the tag specified in ha.tag is not set for any host in the cloud, the HA-enabled VMs will fail to restart after a crash.

HA-Enabled Hosts

The user can specify a host as HA-enabled, In the event of a host failure, attemps will be made to recover the failed host by first issuing some OOBM commands. If the host recovery fails the host will be fenced and placed into maintenance mode. To restore the host to normal operation, manual intervention would then be required.

Out of band management is a requirement of HA-Enabled hosts and has to be confiured on all intended participating hosts. (see “Out of band management”).

Host-HA has granular configuration on a host/cluster/zone level. In a large environment, some hosts from a cluster can be HA-enabled and some not,

Host-HA uses a state machine design to manage the operations of recovering and fencing hosts. The current status of a host is reported when quering a specific host.

Timely health investigations are done on HA-Enabled hosts to monitor for any failures. Specific thresholds can be set for failed investigations, only when it’s exceeded, will the host transition to a different state.

Host-HA uses both health checks and activity checks to make decisions on recovering and fencing actions. Once determined that the host is in faulty state (health checks failed) it runs activity checks to figure out if there is any disk activity on the VMs running on the specific host.

The HA Resource Management Service manages the check/recovery cycle including periodic execution, concurrency management, persistence, back pressure and clustering operations. Administrators associate a provider with a partition type (e.g. KVM HA Host provider to clusters) and may override the provider on a per-partition (i.e. zone, cluster, or pod) basis. The service operates on all resources of the type supported by the provider contained in a partition. Administrators can also enable or disable HA operations globally or on a per-partition basis.

Only one (1) HA provider per resource type may be specified for a partition. Nested HA providers by resource type is not supported (e.g. a pod specifying an HA resource provider for hosts and a containing cluster specifying a HA resource provider for hosts). The service is designed to be opt-in where by only resources with a defined provider and HA enabled will be managed.

For each resource in an HA partition, the HA Resource Management Service maintains and persists an “Finite State Machine” composed of the following states:

  • AVAILABLE - The feature is enabled and Host-HA is available.
  • SUSPECT - There are health checks failing with the host.
  • CHECKING - Activity checks are being performed.
  • DEGRADED - The host is passing the activity check ratio and still providing service to the end user, but it cannot be managed from the CloudStack management server.
  • RECOVERING - The Host-HA framework is trying to recover the host by issuing OOBM jobs.
  • RECOVERED - The Host-HA framework has recovered the host successfully.
  • FENCING - The Host-HA framework is trying to fence the host by issuing OOBM jobs.
  • FENCED - The Host-HA framework has fenced the host successfully.
  • DISABLED - The feature is disabled for the host.
  • INELIGIBLE - The feature is enabled, but it cannot be managed successfully by the Host-HA framework. (OOBM is possibly not configured properly)

When HA is enabled for a partition, the HA state of all contained resources will be transitioned from DISABLED to AVAILABLE. Based on the state models, the following failure scenarios and their responses will be handled by the HA resource management service:

  • Activity check operation fails on the resource: Provide a semantic in the activity check protocol to express that an error while performing the activity check and a reason for the failure (e.g. unable to access the NFS mount). If the maximum number of activity check attempts has not been exceeded, the activity check will be retried.
  • Slow activity check operation: After a configurable timeout, the HA resource management service abandons the check. The response to this condition would be the same as a failure to recover the resource.
  • Traffic flood due to a large number of resource recoveries: The HA resource management service must limit the number of concurrent recovery operations permitted to avoid overwhelming the management server with resource status updates as recovery operations complete.
  • Processor/memory starvation due to large number of activity check operations: The HA resource management service must limit the number of concurrent activity check operations permitted per management server to prevent checks from starving other management server activities of scarce processor and/or memory resources.
  • A SUSPECT, CHECKING, or RECOVERING resource passes a health check before the state action completes: The HA resource management service refreshes the HA state of the resource before transition. If it does not match the expected current state, the result of state action is ignored.

For further information around the inner workings of Host HA, refer to the design document at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA

Primary Storage Outage and Data Loss

When a primary storage outage occurs the hypervisor immediately stops all VMs stored on that storage device. Guests that are marked for HA will be restarted as soon as practical when the primary storage comes back on line. With NFS, the hypervisor may allow the virtual machines to continue running depending on the nature of the issue. For example, an NFS hang will cause the guest VMs to be suspended until storage connectivity is restored.Primary storage is not designed to be backed up. Individual volumes in primary storage can be backed up using snapshots.

Secondary Storage Outage and Data Loss

For a Zone that has only one secondary storage server, a secondary storage outage will have feature level impact to the system but will not impact running guest VMs. It may become impossible to create a VM with the selected template for a user. A user may also not be able to save snapshots or examine/restore saved snapshots. These features will automatically be available when the secondary storage comes back online.

Secondary storage data loss will impact recently added user data including templates, snapshots, and ISO images. Secondary storage should be backed up periodically. Multiple secondary storage servers can be provisioned within each zone to increase the scalability of the system.

Database High Availability

To help ensure high availability of the databases that store the internal data for CloudStack, you can set up database replication. This covers both the main CloudStack database and the Usage database. Replication is achieved using the MySQL connector parameters and two-way replication. Tested with MySQL 5.1 and 5.5.

How to Set Up Database Replication

Database replication in CloudStack is provided using the MySQL replication capabilities. The steps to set up replication can be found in the MySQL documentation (links are provided below). It is suggested that you set up two-way replication, which involves two database nodes. In this case, for example, you might have node1 and node2.

You can also set up chain replication, which involves more than two nodes. In this case, you would first set up two-way replication with node1 and node2. Next, set up one-way replication from node2 to node3. Then set up one-way replication from node3 to node4, and so on for all the additional nodes.

References:

Configuring Database High Availability

To control the database high availability behavior, use the following configuration settings in the file /etc/cloudstack/management/db.properties.

Required Settings

Be sure you have set the following in db.properties:

  • db.ha.enabled: set to true if you want to use the replication feature.

    Example: db.ha.enabled=true

  • db.cloud.replicas: set to a comma-delimited set of replica hosts for the cloud database. This is the list of nodes set up with replication. The source node is not in the list, since it is already mentioned elsewhere in the properties file.

    Example: db.cloud.replicas=node2,node3,node4

  • db.usage.replicas: set to a comma-delimited set of replica hosts for the usage database. This is the list of nodes set up with replication. The source node is not in the list, since it is already mentioned elsewhere in the properties file.

    Example: db.usage.replicas=node2,node3,node4

Optional Settings

The following settings must be present in db.properties, but you are not required to change the default values unless you wish to do so for tuning purposes:

  • db.cloud.secondsBeforeRetrySource: The number of seconds the MySQL connector should wait before trying again to connect to the source after the source went down. Default is 1 hour. The retry might happen sooner if db.cloud.queriesBeforeRetrySource is reached first.

    Example: db.cloud.secondsBeforeRetrySource=3600

  • db.cloud.queriesBeforeRetrySource: The minimum number of queries to be sent to the database before trying again to connect to the source after the source went down. Default is 5000. The retry might happen sooner if db.cloud.secondsBeforeRetrySource is reached first.

    Example: db.cloud.queriesBeforeRetrySource=5000

  • db.cloud.initialTimeout: Initial time the MySQL connector should wait before trying again to connect to the source. Default is 3600.

    Example: db.cloud.initialTimeout=3600

Limitations on Database High Availability

The following limitations exist in the current implementation of this feature.

  • Replica hosts can not be monitored through CloudStack. You will need to have a separate means of monitoring.
  • Events from the database side are not integrated with the CloudStack Management Server events system.
  • You must periodically perform manual clean-up of bin log files generated by replication on database nodes. If you do not clean up the log files, the disk can become full.