Scaling Supercomputers With LinuxScaling Supercomputers With Linux
A band of ex-Cray Research engineers aims to build a Linux architecture for scaling supercomputers made of low-cost nodes.
Up in the wilds of Wisconsin and Minnesota, where the nights grow cold in August, a band of ex-Cray Research engineers is cooking up what could be a hot answer to Linux's scalability shortcomings.
Cray's former chief architect for massively parallel processor systems, Steve Oberlin, is president and CEO of Unlimited Scale Inc. Working out of his home in Chippewa Falls, Wis., and employing fewer than a dozen former Cray colleagues in South St. Paul, Minn., Oberlin says his new company, formed last year, aims to build a Linux architecture for scaling supercomputers made of low-cost nodes.
It's an important challenge, as financial-services companies, life-sciences firms, and oil-exploration companies seek to build high-performance systems from low-cost components, often running the Linux operating system. "Clusters today represent another step along the price-performance curve," Oberlin says.
It's not easy trying to build scalable systems from commodity hardware designed for assembling desktop computers and small servers, though. "Lashing together tens or hundreds of thousands of processors isn't as easy as it appears when you apply it to real-world problems," says Gary Smaby, a supercomputing analyst and a principal of Quatris Fund, an investor in Unlimited Scale. As the number of CPUs in a Beowulf-style cluster-a group of PCs linked via Ethernet-increases and memory is distributed instead of shared, the efficiency of each processor drops as more are added.
Enter Oberlin. Unlimited's solution involves tailoring Linux running on each node in a cluster, rather than treating all the nodes as peers. The idea is to free some computers from getting bogged down in processing interrupt requests from peripherals, while letting a second set of machines run the full operating system, furnishing the cluster with networking, job scheduling, input/output, and other capabilities. Says Oberlin, "On application nodes, you want the operating system to get out of the way."
About the Author
You May Also Like