Google Spills Megastore's SecretsGoogle Spills Megastore's Secrets

A recently published paper sheds light on an important but seldom discussed Google storage system.

Thomas Claburn, Editor at Large, Enterprise Mobility

February 8, 2011

3 Min Read
information logo in a gray background | information

Google's success owes a lot to its computing infrastructure. The company's accomplished engineers have developed and deployed innovations like MapReduce, a way to process large data sets, BigTable, a distributed storage system, Sawzall, an interpreted programming language for analyzing large distributed data sets, the Google File System, a distributed file system, and Google Workqueue, a distributed query management system.

To this list, add Megastore, the storage system that supports Google App Engine, among other applications. Megastore has been used for several years at Google. It was discussed at the SIGMOD 2008 conference but information about the technology has only recently been published, in conjunction with last month's Conference on Innovative Data Systems Research (CIDR).

The paper detailing the technology, "Megastore: Providing Scalable, Highly Available Storage for Interactive Services," describes a storage system tailored to modern interactive online services.

"Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability," the paper states. "We provide fully serializable ACID semantics within fine-grained partitions of data."

Web applications today, the paper says, have to be highly scalable, have to compete for users through rapid development, have to be responsive in terms of latency, have to provide users with data consistently -- no spreadsheets vanishing into the cloud -- and have to be available at all times.

"These requirements are in conflict," the paper states. "Relational databases provide a rich set of features for easily building applications, but they are difficult to scale to hundreds of millions of users. NoSQL datastores such as Google's Bigtable, Apache Hadoop's HBase, or Facebook's Cassandra are highly scalable, but their limited API and loose consistency models complicate application development. Replicating data across distant data centers while providing low latency is challenging, as is guaranteeing a consistent view of replicated data, especially during faults."

Having dismissed traditional RDBMS (relational database management system) and open source databases like MySQL, the paper also knocks "expensive commercial database systems like Oracle [which] significantly increase the total cost of ownership in large deployments in the cloud."

Megastore is designed to replicate file write operations synchronously across a wide-area network with reasonable latency and support for graceful failover across data centers. It aims to strike a middle ground between the scalability of NoSQL databases and the convenience of a traditional RDBMS.

James Hamilton, a VP and distinguished engineer at Amazon.com, has noted the limited public information about Megastore in several personal blog posts over the years and expressed qualified admiration for the technology when Google's paper was published. "Supporting consistent read and full ACID update semantics is impressive although the limitation of not being able to update an entity group at more than a 'few per second' is limiting," he wrote.

The paper states that over 100 production applications use Megastore as their storage service and that most of Google's customers see availability of 99.999% or higher for these applications. The average read latency for these applications is in the tens of milliseconds range and average write latency ranges from 100 to 400 milliseconds, depending on data center distance and the size of the write operation.

Google declined to comment, preferring to let the paper speak for itself.

Read more about:

20112011

About the Author

Thomas Claburn

Editor at Large, Enterprise Mobility

Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, information, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful master's degree in film production. He wrote the original treatment for 3DO's Killing Time, a short story that appeared in On Spec, and the screenplay for an independent film called The Hanged Man, which he would later direct. He's the author of a science fiction novel, Reflecting Fires, and a sadly neglected blog, Lot 49. His iPhone game, Blocfall, is available through the iTunes App Store. His wife is a talented jazz singer; he does not sing, which is for the best.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights