Virtualisation turns up heat on data centre contractors
By Andrew Kirker, General Manager - Datacentres
Tuesday, 23 May, 2017
Squeezing energy efficiencies out of modern data centres can create unintended hot spot issues.
Virtualisation was seen as the great energy saver in the data centre, yet for electrical contractors it has thrown up an unexpected problem — electrical and cooling hot spots.
Hot spots occur when servers are installed and grouped in high-density configurations, and when an unexpected computation load is placed on specific needs.
While temperature management and physical server configuration is a core focus for data centre managers, minimising hot spots has profound implications for electrical contractors. Increased likelihood of branch circuit overload, unforeseen stresses on energy redundancy systems and increased power for cooling systems all need to be considered before rolling out a data centre refresh.
Because of these concerns electrical contractors need to work very closely with data centre managers when undertaking a new install or significant refresh. Electrical contractors need to be aware of the problems and the energy mitigation strategies that are critical in dealing with hot spots in a virtualised server environment.
The rise of high density
While virtualisation may reduce overall power consumption in the room, virtualised servers tend to be installed and grouped in ways that create localised high-density areas that can lead to ‘hot spots’. This cooling challenge may come as a surprise to some given the dramatic decrease in power consumption possible today. However, as a physical host is loaded up with more and more virtual machines its CPU utilisation and power draw increases. Virtualised machines (VMs) also require more processor and memory resources, which again increase power consumption.
The solution: If an existing cooling infrastructure is not sufficient for a high-density environment, there are a few approaches that can be applied. One of the most common is to simply ‘spread out’ the high-density equipment throughout the data centre floor rather than grouping it altogether. This approach does have its drawbacks though, including increased floor space consumption and higher cabling costs.
A more efficient approach may be to isolate higher density equipment in a separate location from lower density equipment. This would involve consolidating all high-density systems down to a single rack or row(s) of racks. Dedicated cooling air distribution, row cooling and/or air containment could then be brought to these isolated high-density pods to ensure they receive the predictable cooling needed at any given time. This approach enables maximum density per rack and also offers a solution for organisations that require high-density equipment to remain co-located.
The impact on power usage effectiveness (PUE)
A widely touted benefit of virtualisation has been reduced energy use and costs as a result of physical server consolidation. And, indeed, these savings are often not trivial. Fully virtualising an environment could produce savings upwards of 50% in energy consumption.
In this scenario, compute capacity often remains the same or is even increased while energy use drops sharply. So why is it then that the most commonly used metric for data centre efficiency, PUE, often worsens after server consolidation takes place? Some suggest that the metric itself is deficient, but we need to remember that PUE is designed to measure the efficiency of a data centre’s physical infrastructure (ie, power and cooling), and not the IT compute power efficiency.
The issue is that if power and cooling infrastructure is left exactly as it was before virtualisation was implemented, then there will be unused power and cooling capacity, known as ‘fixed losses’. And as the IT load shrinks (eg, from consolidation) these fixed losses become a higher proportion of the total data centre energy use, worsening PUE.
The solution: The simple answer is that power and cooling infrastructure must be right-sized to the new overall load. This will not only improve efficiency but directly impact the electric bill by reducing the power consumed by unused power and cooling capacity.
This approach is admittedly difficult to implement for an existing data centre, which instead may benefit from actions such as orientating racks into separate hot and cold aisles or removing unneeded UPS power modules for scalable UPSs.
Dynamic IT loads
Virtualised IT loads, particularly in a highly virtualised, cloud data centre, can vary in both time and location. To ensure availability in such a system, it’s critical that rack-level power and cooling health be considered before changes are made. Failure to do so could undermine the software fault tolerance that virtualisation brings to cloud computing.
In some ways, the increasingly automated creation and movement of VMs helps make a virtualised data centre more fault-tolerant. If a software fault occurs within a given VM or a physical host server crashes, other machines can quickly recover the workload with a minimal amount of latency for the user. Ironically, however, this rapid and sudden movement of VMs can put these IT workloads at risk by exposing them to power and cooling problems that may exist.
The solution: Data centre infrastructure management (DCIM) software can ensure safer automated movement of VMs, but the risk of manual human intervention must be removed. This can be achieved by automating both the monitoring of DCIM information (available rack space, power, and cooling capacity and health) and the implementation of suggested actions.
Also, it should not be forgotten that IT policies related to VM management need to be constructed so that power and cooling systems are considered. Policies should set thresholds and limits for what is acceptable for a given application or VM in terms of power and cooling capacity, health and redundancy.
Lower redundancy requirements
A highly virtualised data centre designed and operated with a high level of IT fault tolerance may reduce the necessity for redundancy in the physical infrastructure where multiple sites or zones are utilised. This effect could have a significantly positive impact on data centre planning and capital costs.
The solution: To take advantage of these benefits, those planning to build a new data centre using ‘2N or 2N+1’ redundant power and cooling systems could perhaps consider building with reduced redundancy levels and leveraging zone or site based redundancy instead. In this scenario, active-active or fail-over can occur at the software layer. This would significantly reduce capital costs and simplify the design of the infrastructure. Before making these types of decisions, IT management systems and policies should be reviewed to ensure they are capable of providing the level of service and fault tolerance that permits having less redundancy in the physical infrastructure.
Powering data centres in the age of AI
As data centres are increasingly relied upon to support power-hungry AI services and...
Smart cities, built from scratch
With their reliance on interconnected systems and sustainable technologies, smart cities present...
Smart homes, cities and industry: Wi-Fi HaLow moves into the real world
Wi-Fi HaLow's reported advantages include extended ranges and battery life, minimised...