Andrew May, Cobweb Cloud Solutions Architect discusses high availability….
Computers fail all the time! It could be the physical hardware burning out, a bad software update containing an undetected bug or an issue with supporting infrastructure such as mains electricity or air conditioning. At a larger scale, natural disasters can damage entire datacentre buildings.
IT solutions should be designed to be highly available, so that when components do fail, the solution as a whole remains up and running. With Azure, high availability is a shared responsibility between Microsoft and the customer. Microsoft provides underlying infrastructure which has resilience built in, such as clusters of compute nodes that may have multiple power supplies and network connections, and data stored three times across different storage hardware. Microsoft also provides information and capabilities within Azure to allow customers to control where their resources are deployed.
As customers, our responsibility is to use this knowledge and functionality to design our Azure solutions to meet our high availability requirements. High availability can be implemented to varying degrees, but generally the more resilient a solution is, the more complex it is and the more it costs. The level of high availability a solution has must balance the increased cost against the impact of the solution being unavailable.
A common myth is a single Virtual Machine (VM) running in Azure can never go down. If it needs rebooting or the OS crashes, the VM will be offline and despite Microsoft’s best efforts, hardware still fails unexpectedly, even if the VM is quickly restarted on a different node. Instead of that single VM, at its most basic, a simple level of high availability would have at least two of these VMs running on different hardware within the same datacentre building.
I’ve written previously that Microsoft groups their Azure datacentres into regions – named locations, such as UK South in London, UK. Regions are usually great distances from each other, so an issue in one region is unlikely to affect other regions. When an Azure resource is deployed, one of the properties that must be specified is which region it will be created in. Distributing highly available solutions across more than one region protects against a single region suffering an outage, but it requires duplicate infrastructure in each region, including virtual networks, load balancers, backups, etc. On top of this, there must be some way to distribute the workload across the regions and handle the situation should a region have an issue and become unavailable.
Multi-region solutions can be complex to build, costly to run or provide a level of high availability that is beyond what is required. In recent years Microsoft have enabled some Azure regions with Availability Zones and more are scheduled to be enabled throughout 2022. An Availability Zone is a distinct physical location within an Azure Region that has its own power, networking and cooling. To simplify things, I often describe each Availability Zone as a separate datacentre building within the region. If you don’t choose to use Availability Zones or a zone redundant tier of a resource, your Azure resources and redundant copies of Azure services may all be created in one building, but if you do choose to use them, your resources will be distributed across multiple, independent datacentre buildings.
Some resources such as VMs, must be pinned to a specific zone of your choice, allowing you to ensure solution components that must be separated can be. Others, such as virtual networks and load balancers, span all of the Availability Zones in a region. This means pinned resources in different zones can be connected to the same networking, which reduces infrastructure duplication and therefore complexity and cost.
Availability Zones are a great, cost-effective way to increase resiliency in Azure, allowing solutions to be made highly available much more easily. Many Azure services now support Availability Zones and many more are being updated to support them all the time. If you want to add Availability Zones to an existing solution, or you’re looking to build a new solution with high availability, Cobweb can help guide you to an optimised solution.