Fast ForwardFast Forward
High-performance computing: It's finally moving out of science labs and into enterprise decision-making.
High-performance computing (HPC) developers run a race with no end. Each stage won, each accomplishment, leads to new possibilities — new notions about what problems are computable — as the road forward extends ever farther into the distance. That road has always led to the most complex, computationally intensive problems — studying weather systems, designing nuclear weapons, and the like — but with HPC advances and constant hardware evolution, HPC has become accessible for the more ordinary analytic challenges most of us face.
Then and Now
Most Intelligent Enterprise readers have been around computing long enough to have seen an orders-of-magnitude leap in computing power available for their everyday work. I ran satellite simulations for NASA in the mid-1980s on a Control Data Cyber 205, a supercomputer with ultra-fast compute cycles (for the '80s) and an attached vector processor capable of carrying out 65,536 parallel instances of a calculation. I reckon that equivalent computational power became available in a PC about 15 years later. I no longer program terrestrial gravity and magnetic-field models, but I've continued to work with "big data" — for example, helping produce demographic analyses around 250 times more detailed in 2000 than those tackled in a decade earlier. (Whether the more recent results are 250 times more valuable than the earlier results is a worthy question.) The problems expand to consume the available resources.
Large-scale problems typically still require expensive, large-scale systems, the innovation of recent years being that many of the fastest systems nowadays are built with commodity hardware and operating systems. High-performance machines with fast clock speeds and high-capacity, high-bandwidth storage are now cheap, but you still need a lot of them if you want to do heavy-duty analyses. What's new is that developers can use the same hardware building blocks for large-scale problems that the rest of us use to run office bloatware and play Grand Theft Auto III.
Another element that's new, at least to me, is a shift in terminology from supercomputing to high-performance computing (HPC). I interpret the shift as reflecting the realization that there's more to IT productivity than fast compute cycles. There's suitability for a particular job, the notion that high performance means doing the task at hand, no matter the size, really well. So HPC is about traditional supercomputing and adapting off-the-shelf hardware to solve particular problems instead of fitting the problems to the available hardware. Google's 100,000-node cluster is the apotheosis of this trend. That system targets a special high-throughput computing problem but wouldn't necessarily offer equivalent advantages for simulating protein folding or rendering richly textured animation frames.
Flying Pringles
The animation of the movie Shrek 2 was the subject of a presentation I attended at a July High-Performance Computing Users Conference sponsored by the Council on Competitiveness, the Defense Advanced Research Projects Agency (DARPA), and the Department of Energy. While the Defense and Energy Departments have long funded development of advanced systems, it's accepted wisdom that high-stakes commercial applications like computerized movie making and other variations of advanced visualization, including gaming, are the key to pushing HPC into new realms. (I suppose the ability to render natural-seeming hair movement is important if you need to divert audience attention from weak character development.) Nonetheless, the HPC application that stole the show was presented by an industrial engineer, Tom Lange of consumer-goods manufacturer Proctor & Gamble. Lange spoke about Flying Pringles.
Pringles are, of course, a food product fabricated from potatoes, perhaps known more for their shape and packaging — Pringles cans were infamously used in early efforts to create cheap WiFi-signal-boosting waveguide antennas — than for flavor. Apparently, saddle-shaped Pringles tend to go airborne during the manufacturing process, and Lange and colleagues have brought HPC to bear to design manufacturing processes that make sure that the chips stay down. Lange also illustrated applying HPC to the design of lightweight-but-strong bottles. Pulling an explanation off the Web, Proctor & Gamble's aim is "to create and test prototypes of products and the machines that manufacture them in a virtual state — thereby eliminating the need for first-round physical prototypes. The supercomputer also allows P&G teams to study the effects of production-line changes without building the actual equipment. This eliminates costly and time-consuming trial-and-error testing that was once done using physical prototypes."
Detergent bottles and chips are low-cost items, but they're produced in very high volume. Shaving fractions of a cent in the production of each unit can lead to the same order of savings realized in traditional supercomputing tasks such as jet aircraft design, nuclear-weapons research, and the like.
These are classic applications of modeling, simulation, and optimization techniques, essentially of asking "what if" under various scenarios and studying the outcomes in order to identify the conditions that will lead to preferred results. Simulation is about finding data in models, in patterns; it's the inversion of data mining, of discerning patterns in data.
HPC and DS
I've been trying to figure out if HPC can play in mainstream decision support. I haven't encountered any vendors interested in rewriting their OLAP engines to support parallelized, multithreaded cube analysis — selecting appropriate software and reworking data structures and processes solves most OLAP-related performance issues I've seen — and I can't see much advantage in running Excel, the world's most popular decision-support software, on a 64-node cluster. It's those analyses that can be decomposed into many, simultaneously executable threads that will gain most from HPC.
I was in touch with analytics vendor SAS in connection to this column, specifically about its recently released High-Performance Forecasting product. SAS states that HPF is capable of producing millions of forecasts with reasonable turnaround (for example, for retail products by store or sales channel), but high-volume calculation makes up only a third of the story, and not the most interesting part. SAS software has few peers in handling a wide variety of analytic models that account for seasonal patterns and other complications, and it provides model identification and fitting tools that can automatically look at data sets to determine the model — the equations and their parameters — that most accurately describes each item being studied. (I used an early version of this facility on a project eight years ago.) The third germane HPF element is the SAS system's ability to further analyze results, for instance, by applying data mining and optimization algorithms to the mass of forecasts to detect and exploit clusters and other patterns.
SAS's HPF product is a good example of what IBM calls "deep computing," or three-quarters of it anyway: sophisticated algorithms, high-performance software, and domain knowledge, to which IBM adds super-fast computing, which it does as well as any company. IBM targets the top slots of the Top500 list of fastest supercomputers and produced 44.8 percent of the systems on the June 2004 list. But IBM also understands that deep-computing principles can be profitably applied to a much larger market than Top500 users, and it has established a new Center for Business Optimization to further its efforts.
The ability to handle large-volume analyses, automating-away rather than ignoring complexities, differentiates high-performance analytics from run-of-the-mill (although useful) techniques. And it illustrates the role of software and algorithmic innovation, coupled with a lot of fast compute cycles, in delivering true high-performance computing. At the July HPC Users Conference, DARPA Director Anthony J. Tether said that "speed is not what it's all about; it's really 'productivity'" — an observation nowhere more true than in the world of enterprise decision-making.
Seth Grimes heads Alta Plana Corp., a Washington, D.C.-based consultancy specializing in business analytics and demographic and economic statistics.
Resources
High-Performance Computing Users Conference
hpcusersconference.com/home.html
IBM Deep Computing Institute
www.research.ibm.com/dci/index.shtml
SAS High-Performance Forecasting
sas.com/technologies/analytics/forecasting/hpf/index.html
TOP500 Supercomputer Sites
top500.org
About the Author
You May Also Like