Fetching directory, one moment please ...
All Servers in a XenServer HA Pool Stuck in a Perpetual Reboot Cycle
SymptomsCitrix XenServer hosts configured for HA enter a perpetual reboot loop. All hosts in the pool cannot be started because they continuously shutdown after start-up.
HA monitors the heartbeat of configured HA peers in the pool and shares heartbeat information with all peers in the HA pool via a shared disk. If the heartbeat for a host in the pool fails; other hosts will load guests that were running on the failed unit. If a host loses connectivity with all other peers in the pool or is unable to communicate with the shared heartbeat disk, it will assume it has a critical failure and will shut down so that its’ guests will be restarted on other stable servers.
If a situation arises where all hosts lose connectivity with other, all will detect heartbeat failure with the rest of the HA pool and all will shutdown. This situation can arise where:
- There are only two servers in the pool and one fails.
- A single point of failure between the hosts in the HA pool and the shared heartbeat disk fails, for example a switch.
The only way to get the XenServer infrastructure back in service after this situation arises is to disable HA by running:
xe host-emergency-ha-disable –force
On the hosts
HA should only be re-enabled when there is adequate redundancy on shared network components (NICs, Switches etc) and the shared heartbeat disk. Any HA solution with only two hosts will experience the problem if one host fails. Therefore any XenServer HA solution must include at least three hosts in the design.
|Functionality Affected:||OS - Boot Mechanisms|