Management Strategies For The Cloud Revolution, Part IIManagement Strategies For The Cloud Revolution, Part II

In this second of a two-part book excerpt, author Charles Babcock, editor at large with information, explains why the cloud is different from what's gone before

Charles Babcock, Editor at Large, Cloud

May 15, 2010

4 Min Read

Google, Amazon.com, and now Yahoo!, as it embarks on its own use of cloud data centers internally (a private cloud), have all adopted a design principle from the Internet itself. The Department of Defense, through its Defense Advanced Research Projects Agency (DARPA), wanted a network that would survive a nuclear attack; what it got was the birth of the Internet. The routers on the Internet detect when a router in the next network segment isn't working and automatically route around it. Likewise, when a server in a cloud data center fails, the managing software routes the workload elsewhere and doesn't send that server anything more to do until it's fixed.

A fault-tolerant data center made from inexpensive parts used to be an oxymoron. At one time, Tandem Computers achieved fault tolerance, but only by running identical computers side by side doing the same work, so that one could fail without an impact on the business. Now fault tolerance is one of the secrets behind elasticity. If adding hardware is a simple, low-cost task, then hardware can be pulled on line as needed.

If failures occur, as they inevitably do, they can be managed routinely and the data center will continue functioning. This is a central principle of what Holzle and Barroso called "The Data Center as a Computer: An Introduction to the Design of Warehouse-Scale Machines," their Synthesis Lecture on Computer Architecture at the University of Wisconsin, Madison.

One additional example: as noted before, the large cluster needs an interconnect that ties it into the other machines in the cluster. Many have wondered how Google's warehouse scale machine achieves this, and have assumed that it used the highest-speed interconnects available to achieve the speeds that it does. High-speed interconnects, however, are also the most expensive method, violating the principle that the cloud data center must be built with reliable but inexpensive parts. Infiniband networks can transport data at 40 gigabits per second. High-end Ethernet transports it at 10 gigabits per second, and these were my candidates for Google's interconnect fabric.

Holzle and Barroso say that this can't be so. Infiniband costs $500 to $1,000 extra per port, they write. Large-scale Ethernet moves data at 10 gigabits per second, "but again, at a cost of hundreds of dollars per server. The alternative is low-cost fabrics from commodity Ethernet switches." I don't know the brand name or capacity of the Google cluster switching fabric. But a highly reliable 1-gigabit Ethernet switch, for example, costs $148.

The interconnect, like the servers themselves, is built out of the most proven mass-produced parts. Elasticity is related to this economy of scale. If you haven't mastered the art of building the big server cluster, you will find it hard to deliver "elastic" service. Remember, you've got to do so within the cloud business model, which demands low prices. How low? As low as your competitors can go. It's a tough trade-off. In the cloud, elasticity is inevitably tied to implementing massive economies of scale.

Many simple computers built of similar parts can be managed by fewer people. One management interface and a layer of system management software can scale up to many units, noting which ones are functioning properly and which ones are showing signs of heating up, slowing down, or experiencing malfunctioning parts.

The cloud data center is a cluster different from any that we've seen before. The fact that this resource is available to many users at highly economical rates is part of the excitement of cloud computing.

We still haven't touched on one of the most important ways in which the cloud becomes elastic, however, and that's virtualization. The impact of virtualization has been so great that it's the subject of the next chapter. But we know that elasticity is built into the design of the cloud data center. It's elasticity that is responsible for the illusion that we are connected to a boundless resource, capable of processing the most demanding job we can conjure up. It's an outstanding feature of cloud computing.

Elasticity is responsible for much of the excitement surrounding cloud computing. If we can process anything that we want, then what do we want to process? Whether the cloud user is an individual or a business, a new opportunity is unfolding. A convergence of technologies, evolutionary in form, looks more and more revolutionary in scope. Critics are inclined to say that there's nothing in the cloud that they haven't seen before. Some critic, I suspect, said the same thing about the printing press.

Cloud computing takes a set of technologies that have been already proved elsewhere and leverages them to generate new economies of scale and new end user services. Everyone is fascinated with the size of the cloud data centers—eight football fields across, one analyst said. But some observers are intrigued with what the end user is going to do with this resource. Amid the data center's whirring fans and grinding disks, I keep hearing an echo of the PC Revolution. It may be just an illusion that we can do anything we want to "in the cloud," but it's an illusion that the cloud is likely to sustain far into the future.

To read the first part of this book excerpt, click here.

About the Author

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for information and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

See more from Charles Babcock

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

Recent in More

Management Strategies For The Cloud Revolution, Part IIManagement Strategies For The Cloud Revolution, Part II

About the Author

Editor's Choice

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

Recent in More

<span class="ArticleBase-LargeTitle">Management Strategies For The Cloud Revolution, Part II</span>Management Strategies For The Cloud Revolution, Part IIManagement Strategies For The Cloud Revolution, Part II

About the Author

Editor's Choice

Management Strategies For The Cloud Revolution, Part IIManagement Strategies For The Cloud Revolution, Part II