If you have played around with Citrix Provisioning services, you will quickly realize that the TFTP Service can be a pretty big single point of failure. The TFTP service is typically installed on the Provisioning Servers and is responsible for sending down a bootstrap .BIN file during the PXE boot process that helps the machines begin their streaming boot process. The .BIN file contains various configuration settings and a list of available Provisioning Servers within the environment for the clients to retrieve their vDisks from.
Here is a quick overview of the flow: A client machine boots up and does a PXE boot. A defined DHCP scope responds with an IP address and option 66 that tells the machine the IP address of the TFTP server. The client machine contacts the TFTP server to download the .BIN file and proceeds to stream down it’s OS from one of the available Provisioning servers.
The issue with the DHCP option is that it will only accept a SINGLE entry. If you point it towards one of the Provisioning Servers hosting the TFTP service, you’ve effectively BROKEN the Provisioning HA capabilities (at least if the one server you define goes down).
So how do we ensure proper High Availability for the TFTP service?
If you have a Citrix NetScaler, you are COVERED! Although any hardware based load balancer will probably work just as well. Basically, you need to set up a VIP for your TFTP service.
When using the NetScaler, the TFTP service can be checked for availability and functionality at regular intervals on all available provisioning servers. If there is issue or an outage on one of the servers or services, it will be automatically removed from the load balancing list maintained by the NetScaler. Rather than configuring the DHCP option to point directly to a TFTP server, a NetScaler vServer could be created and its Virtual IP address would be used to provide the level of redundancy we need for enterprise solutions. You can get more information on configuring the NetScaler at the Citrix KB : CTX116337
If you haven’t sprung for a NetScaler or similar hardware load balancer yet, you can use a DNS Round Robin solution. For this, a fully qualified domain name (FQDN) can be configured within the DHCP option 66. The DNS Round Robin entry will contains a list of multiple IPs instead of a single IP. In this scenario all systems corresponding to the IPs configured, are used rotationally, so if you experience an outage on one system, the other systems should respond next time. Setting automatic reboots within the BIOSes of the target machines will ensure that they keep requesting IPs until they receive a working TFTP server. To minimize the impact of an outage, short DNS time to lives (TTLs) for the FQDN can be configured.
DNS Round Robin isn’t the greatest solution but the price is good for the added level of protection you will get and better than nothing at all.