Hadoop and the Big-Data RevolutionHadoop and the Big-Data Revolution

There's a revolution underway in the use of big data, and Hadoop, the open-source distributed computing system, is at the center of it. Apache Hadoop success stories and accolades were shared today by the likes of Yahoo!, Facebook, eHarmony, IBM and JP Morgan Chase at Hadoop World in New York City. Here's a sampling of highlights...

Doug Henschen, Executive Editor, Enterprise Apps

October 2, 2009

3 Min Read
information logo in a gray background | information

There's a revolution underway in the use of big data, and Hadoop, the open-source distributed computing system, is at the center of it. Apache Hadoop is most often associated with MapReduce data processing, but it also includes a distributed file system and subprojects including the Hive data warehouse. All of the above were at the subject of success stories, accolades and palpable excitement at today's Hadoop World in New York City. Executives from Yahoo!, Facebook, eHarmony, IBM and JP Morgan Chase were here offering insight into how Hadoop is changing expectations for analysis of big data.

Sharing a few highlights from today's presentations, here's what these organizations are doing with Hadoop:

  • Yahoo!, by far the largest developer and contributor to Hadoop, uses it to analyze and improve content optimization, spam filtering, search indexing and ad optimization. Yahoo! has a 4,000-node cluster with 16 petabytes of disk space available for Hadoop analysis, and it has used this infrastructure to sort 1 petabyte of data in 16 hours (across 3,700 nodes) and 1 terabyte of data in 62 seconds (across 1,500 nodes).

  • Facebook is using Hadoop to help analyze the 4 terabytes of compressed new data added to the social networking site each day. Facebook's Hive-based data warehouse runs 7,500 jobs per day for a total of more than 80,000 compute hours. Reporting is a key task, with daily and weekly aggregations of impressions and click counts across the site. Results are reported and explored though MicroStrategy dashboards.

    eHarmony, the online dating service, is using Hadoop processing and the Hive data warehouse to better understand and more accurately match people among its 20 million registered users. IBM's Emerging Technologies unit has used Hadoop for an experimental mergers-and-acquisitions due-diligence engine. The project compared 1.4 million patent records against fourteen years' worth of Court of Appeals records to spot legal challenges on intellectual property ownership. IBM said the engine has performed in 5 minutes what would otherwise take teams of legal researchers a week to compile. JP Morgan Chase presented here today describing proof-of-concept data warehousing projects that are pursuing "order of magnitude savings" using open-source Hadoop and commodity hardware rather than conventional relational databases and SMP hardware.

The Hadoop World event was presented by Cloudera, a software and professional services firm focused exclusively on Hadoop. The firm announced Cloudera Desktop, a new Web-based, user-friendlier (though still programmer-oriented) interface for Hadoop applications. The Desktop can be used with on-premise implementations of Hadoop or cloud-based instances hosted on Amazon EC2. Amazon executives were also on hand today to discuss use of Amazon Elastic MapReduce, which is a Web services-based implementation built on the Hadoop framework. Amazon announced a partnership whereby customers can specify Cloudera instances within Amazon Elastic MapReduce in order to secure that vendor's professional services and support.

Cloudera founder Christophe Bisciglia opened the day saying that Hadoop is fast becoming pervasive and an increasingly obvious choice not just for Web companies but for all types of companies with big-data challenges and opportunities. Judging by the enthusiasm and numbers of attendees here today (surpassing 500), the big-data revolution has swept out of Silicon Valley and is reaching mainstream corporate data centers.There's a revolution underway in the use of big data, and Hadoop, the open-source distributed computing system, is at the center of it. Apache Hadoop success stories and accolades were shared today by the likes of Yahoo!, Facebook, eHarmony, IBM and JP Morgan Chase at Hadoop World in New York City. Here's a sampling of highlights...

Read more about:

20092009

About the Author

Doug Henschen

Executive Editor, Enterprise Apps

Doug Henschen is Executive Editor of information, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of Transform Magazine, and Executive Editor at DM News. He has covered IT and data-driven marketing for more than 15 years.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights