GPUs Force CIOs to Rethink the DatacenterGPUs Force CIOs to Rethink the Datacenter
A growing reliance on graphical processing units (GPUs) is changing datacenter dynamics.
Generative AI isn’t only changing the way organizations do business, it’s altering the way they consume computing resources. These large language models (LLMs) -- as well as other AI tools such as digital twins, extended reality and the metaverse -- require huge numbers of graphical processing units (GPUs) to train data sets or handle graphics intensive tasks.
There’s a catch, however. GPUs are expensive, they’re in short supply, and they devour enormous amounts of energy. As a result, CIOs and other business and IT leaders are increasingly faced with the proposition of how and where to use them. “It’s critical to understand the required task and balance the need for processing power with costs,” says Alan Priestley, a vice president and analyst for Gartner.
All of which leads directly to the datacenter. As AI goes mainstream, organizations must adapt, says Teresa Tung, Cloud First Chief Technologist for Accenture. It isn’t enough to understand where GPUs deliver strategic gains, CIOs must make critical decisions about when to use GPUs or CPUs and whether to handle training, inferencing, and other tasks on premises or in the cloud.
Peak Performance
Despite all the recent hype about GenAI, GPUs have been trickling into the datacenter for more than a decade. Graphics processors play a key role in scientific research, deep learning, machine learning, and numerous other tasks, including machine vision, robotics, and automation. “They’ve become a valuable tool for handling complex simulations and massive data challenges,” Priestley says.
Yet things changed in a dramatic way in November 2022. Following the public release of ChatGPT -- and the subsequent emergence of GenAI frameworks like Microsoft Copilot and Google Gemini -- organizations began actively exploring ways to put LLMs to work. It soon became apparent that AI customization is critical for achieving specific tasks, including chatbots, content creation, design iteration, market research, cybersecurity, fraud detection, product prototyping, and various other use cases.
Today, demand for GPUs in the datacenter is skyrocketing. A 2024 study conducted by real estate investment firm JLL found that by 2027, average rack density in datacenters will reach 50kW per rack, surpassing the current average of 36kW. “The exponential progress of artificial intelligence and machine learning is fueling a wave of transformative shifts in data center design, site selection, and investment strategies,” it noted.
Meanwhile, GPUs are increasingly pricy. For example, the NVIDIA GeForce RTX 4090, a widely deployed top-of-the-line model introduced in 2022, starts at around $1,600 per unit. Less expensive GPUs with less video memory still run into the hundreds of dollars. But the upfront investment in hardware is only simply a starting point. GPUs typically burn through double or triple the electricity of CPUs, while requiring robust cooling and more elaborate cabling.
The bottom line? Many data centers are running out of both space and power to operate GPUs. Consequently, CIOs must make some tough decisions about how to approach AI -- and when GPUs deliver clear advantages. For some massively parallel tasks like AI training workloads, GPUs can actually reduce the overall TCO by performing computations much faster. However, for other workloads, such as AI inference, CPUs typically provide sufficient performance while lowering per-watt operating costs.
A starting point, Tung says, is to identify the specific use case and what level of performance and accuracy are required. At that point, it’s possible to plug in factors like costs and carbon output and determine what hardware to use and whether processing should take place in the cloud or on premises. Foundation model training requires GPUs, but inferencing is a different story. In some cases, “you can even do the inferencing on a laptop or handheld device,” she says.
Power Plays
Benchmarking is critical for building out an AI optimized IT infrastructure, Tung says. It provides side-by-side insight into how GPUs and CPUs will tackle specific tasks. By comparing metrics on model performance and architecture, a CIO can determine the best processor -- or mix of devices -- for a specific task. It’s also possible to identify essential upgrades and view cost, performance and the carbon footprint data in context with different approaches.
For example, Stanford University maintains a HELM benchmark that compares foundation models, including both managed and pen-source models, against metrics like accuracy and performance. This benchmark helps companies determine the benefits of maintaining their own custom model and tradeoffs related to model size. This is crucial because understanding model size helps determine the number of GPUs needed for training, and whether GPUs are needed for inferencing.
Other tools, such as Geekbench, Novabench, and Basemark GPU, can also help organizations drill down into GPU performance and costs.
Meanwhile, Accenture has developed a proprietary tool called Switchboard. Armed with information about tradeoffs between model performance, cost and sustainability, a CIO can make informed decisions about compute resources. This includes what model to use and whether it’s best to host it on premise or in the cloud.
In some cases, “It might make sense to use dedicated GPU instances on premises when base workloads and demands over time are well understood, or for reasons like sovereignty or data gravity,” Tung points out. However, she says that the cloud is the obvious choice for most workloads. It delivers a more dynamic, elastic and scalable architecture. It’s also the fastest and easiest way to gain access to scarce GPUs.
The Datacenter Evolves
All major cloud providers, including Microsoft Azure, Google Cloud Platform (GCP), and Amazon Web Services (AWS), now offer virtual machines with pre-attached GPUs or the option of renting GPU-enabled servers. As a result, there’s no need to invest in physical hardware. These clouds also don’t overlook manageability. They offer GPU passthrough tools that directly manage performance factors on GPU hardware.
This allows CIOs to provision and manage complex environments, including hybrid situations that involve both GPUs and CPUs. It includes tools for scaling and utilizing resources, configuring GPU memory and establishing instance types for specific tasks such as machine learning or video editing. In addition, third-party applications like SkyPilot allow users to manage and scale large language models running on GPUs in the cloud.
With a clear understanding of crucial factors -- including the size and breadth of the training dataset; who will be using it; the projected volume of queries or hits on the system; and how GPUs and CPUs stack up -- it’s possible to make informed decisions. For instance, “In some cases, different types of GPUs may be needed for inferencing and running a system, or a CPU with accelerators on board might be better equipped to handle smaller models,” Priestley notes.
It’s also possible to view GPUs and CPUs through the lens of sustainability and carbon-performance tradeoffs. Says Tung: “We’re going to see a growing demand for AI and the need for GPUs in the enterprise. But we’re also likely to see a greater mix of GPUs and CPUs because many tasks remain more efficient cost-wise and carbon-wise on CPUs.”
About the Author
You May Also Like