Improving efficiency for fast data centre operations
System better allocates time-sensitive data processing across cores to maintain quick user-response times.
Today’s data centres eat up and waste a good amount of energy responding to user requests as fast as possible, with only a few microseconds delay. A new system by MIT researchers improves the efficiency of high-speed operations by better assigning time-sensitive data processing across central processing unit (CPU) cores and ensuring hardware runs productively.
Data centres operate as distributed networks, with numerous web and mobile applications implemented on a single server. When users send requests to an app, bits of stored data are pulled from hundreds or thousands of services across as many servers. Before sending a response, the app must wait for the slowest service to process the data. This lag time is known as tail latency.
Current methods to reduce tail latencies leave tons of CPU cores in a server open to quickly handle incoming requests. But this means that cores sit idly for much of the time, while servers continue using energy just to stay powered on. Data centres can contain hundreds of thousands of servers, so even small improvements in each server’s efficiency can save millions of dollars.
Alternatively, some systems reallocate cores across apps based on workload. But this occurs over milliseconds — around one-thousandth the desired speed for today’s fast-paced requests. Waiting too long can also degrade an app’s performance, because any information that’s not processed before an allotted time doesn’t get sent to the user.
In a paper presented at the USENIX Networked Systems Design and Implementation conference, the researchers developed a faster core-allocating system, called Shenango, that reduces tail latencies while achieving high efficiencies. First, a novel algorithm detects which apps are struggling to process data, then a software component allocates idle cores to handle the app’s workload.
“In data centres, there’s a tradeoff between efficiency and latency, and you really need to reallocate cores at much finer granularity than every millisecond,” said first author Amy Ousterhout, a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL). Shenango lets servers “manage operations that occur at really short time scales and do so efficiently”.
Energy and cost savings will vary by data centre, depending on size and workloads. But the overall aim is to improve data centre CPU utilisation, so that every core is put to good use. The best CPU utilisation rates today sit at about 60%, but the researchers say their system could potentially boost that figure to 100%.
“Data centre utilisation today is quite low,” said co-author Adam Belay, an Assistant Professor of Electrical Engineering and Computer Science and a CSAIL researcher. “This is a very serious problem [that can’t] be solved in a single place in the data centre. But this system is one critical piece in driving utilisation up higher.”
Efficient congestion-detection
In a real-world data centre, Shenango — algorithm and software — would run on each server in a data centre. All the servers would be able to communicate with each other.
The system’s first innovation is a novel congestion-detection algorithm. Every five microseconds the algorithm checks data packets queued for processing for each app. If a packet is still waiting from the last observation, the algorithm notes there’s at least a 5-microsecond delay. It also checks if any computation processes, called threads, are waiting to be executed. If so, the system considers that a “congested” app.
It seems simple enough. But the queue’s structure is important to achieving microsecond-scale congestion detection. Traditional thinking meant having the software check the timestamp of each queued-up data packet, which would take too much time.
The researchers implement the queues in efficient structures known as “ring buffers”. These structures can be visualised as different slots around a ring. The first inputted data packet goes into a starting slot. As new data arrive, they’re dropped into subsequent slots around the ring. Usually, these structures are used for first-in-first-out data processing, pulling data from the starting slot and working toward the ending slot.
The researchers’ system, however, only stores data packets briefly in the structures, until an app can process them. In the meantime, the stored packets can be used for congestion checks. The algorithm need only compare two points in the queue — the location of the first packet and where the last packet was five microseconds ago — to determine if packets are encountering a delay.
“You can look at these two points, and track their progress every five microseconds, to see how much data has been processed,” CSAIL PhD student Joshua Fried said. Because the structures are simple, “you only have to do this once per core. If you’re looking at 24 cores, you do 24 checks in five microseconds, which scales nicely.”
Smart allocation
The second innovation is called the IOKernel, the central software hub that steers data packets to appropriate apps. The IOKernel also uses the congestion detection algorithm to quickly allocate cores to congested apps orders of magnitude more quickly than traditional approaches.
For instance, the IOKernel may see an incoming data packet for a certain app that requires microsecond processing speeds. If the app is congested due to a lack of cores, the IOKernel immediately devotes an idle core to the app. If it also sees another app running cores with less time-sensitive data, it will grab some of those cores and reallocate them to the congested app. The apps themselves also help out: if an app isn’t processing data, it alerts the IOKernel that its cores can be reallocated. Processed data goes back to the IOKernel to send the response.
“The IOKernel is concentrating on which apps need cores that don’t have them,” said CSAIL PhD student Jonathan Behrens. “It’s trying to figure out who’s overloaded and needs more cores, and gives them cores as quickly as possible, so they don’t fall behind and have huge latencies.”
The tight communication between the IOKernel, algorithm, apps and server hardware is “unique in data centres” and allows Shenango to function seamlessly. Belay said: “The system has global visibility into what’s happening in each server. It sees the hardware providing the packets, what’s running where in each core, and how busy each of the apps are. And it does that at the microsecond scale.”
Next, the researchers are refining Shenango for real-world data centre implementation. To do so, they’re ensuring the software can handle a very high data throughput and has appropriate security features.
“Providing low-latency network services is critical to many internet applications. Unfortunately, reducing the latency is very challenging especially when multiple applications compete for shared computer resources,” said KyoungSoo Park, an Associate Professor of Electrical Engineering at the Korea Advanced Institute of Science and Technology. “Shenango breaks the conventional wisdom that it is impossible to sustain low latency at a very high request load with a variable response time, and it opens a new system design space that realises microsecond-scale tail latency with practical network applications.”
Reprinted with permission of MIT News.
Originally published here.
Powering data centres in the age of AI
As data centres are increasingly relied upon to support power-hungry AI services and...
Smart cities, built from scratch
With their reliance on interconnected systems and sustainable technologies, smart cities present...
Smart homes, cities and industry: Wi-Fi HaLow moves into the real world
Wi-Fi HaLow's reported advantages include extended ranges and battery life, minimised...