Greenplum Cloud Plan Promises New Era for Data WarehousingGreenplum Cloud Plan Promises New Era for Data Warehousing
Flexible private-cloud scalability meets massively parallel processing. Promised push-button provisioning awaits next release.
Analytic database provider Greenplum today announced an Enterprise Data Cloud (EDC) initiative that envisions rapid and flexible "self-service" provisioning of analytic data marts and data warehouses on private grids of commodity hardware. It's not quite virtualization; it's not quite cloud computing; and it's not quite ready. Still, the initiative synthesizes good ideas -- some real and some yet to be developed -- and it will no doubt advance competitive thinking and perhaps prevailing practices in data warehousing.
"The goal is really redefining the data warehouse and analytics market," says Ben Werther, Greenplum's director of product management. "For too long, the business has had to struggle to get access to the data and the resources it needs... This is about business analysts being able to create a warehouse from a pool of available resources with just a click of a button and have it available in minutes."
That pool of resources is on an EDC technology platform that encompasses three elements: self-service provisioning, what Greenplum calls "Elastic scale." and massively parallel processing (MPP). On the last point, Greenplum concedes it is hardly alone, with vendors from Teradata, IBM, Netezza, HP and (recently) Oracle to Vertica, Kognitio and 1010 Data exploiting MPP (and Microsoft set to join the list next year).
Greenplum's Elastic scale seems to borrow terminology from Amazon's Elastic Compute Cloud (EC2). But the flexible scaling and provisioning Greenplum envisions is provided by a private, on-premise grid of commodity hardware -- not on the kind of endlessly scalable, third-party resource most people associate with cloud computing. Thus, customers will have to buy, deploy, manage and have capacity available in order to quickly deploy new marts and warehouses as Greenplum promises.
Why not use a public cloud? "People do want the cloud benefits of elasticity and self service," Werther maintains. "But with the data volumes we're talking about and the kind of privacy concerns that are out there, today, people don't want to do it in a public cloud. They want these efficiencies in-house."
What about the cost (not to mention implementation time and management challenges) of adding the required capacity? "With the price of commodity hardware today, you can buy 1,000 cores of servers for less than $1 million," Werther points out. "The economics are such that any reasonable-size organization can make an investment in the dozens or hundreds of cores they need to do this kind of data warehousing."
Werther holds up the commodity hardware angle as a differentiator compared to Teradata and Netezza, which use proprietary hardware. That said, the on-premise route leaves an opening for Vertica, Kognitio, 1010 Data and other vendors (including Oracle) that offer pay-as-you-go options on third-party clouds -- some public, such as Amazon EC2, but others private, built for hosted provisioning. In this approach to cloud computing, customers don't have to buy, deploy and manage the hardware, and they typically have the flexibility to turn capacity on and off as needed. Despite the potential privacy concerns, Forrester Analyst James Kobielus says there's a lot of interest in third-party clouds.
"If you need a 100-terabyte data mart for just six months, you might be very comfortable adopting a [public] cloud service, particularly if you're a Web 2.0-type of company that is itself running in the cloud," Kobielus says. "You can pay for it as an operating expenditure rather than as a capital expenditure… If you're moving processes toward cloud-based services such as Salesforce.com, you already have data running in a third-party's hosted service, so why not do data mining in a similar model?"
That leaves self-service provisioning -- the promise of being able to launch a mart with the click of a button -- as the key, differentiating aspect of today's Greenplum announcement. Indeed, Greenplum customer eBay already has a home-grown version of push-button data-mart deployment, and it's said to allow business analysts to specify the data and the capacities and then spin out new data marts and analytic "sandboxes" without having to wait for IT to make it happen. It's a compelling vision, but Greenplum's Web-based, push-button-easy interface suitable for business analysts won't be ready until the Greenplum 3.4 release, toward the end of the year. What Greenplum offers for now is a more user-friendly environment for IT, according to Forrester's Kobielus.
"With the 3.3 release, Greenplum has made it easier to build data marts, handle the ETL and run in the grid," he explains. "You could hand the tools to application developers or DBAs who don't have a lot of background or understanding of massive parallelism. It buries the gory details behind a logical front-end that makes it easy [for developers] to handle the development, administration and optimization of data marts and data structures."
In short, with EDC, Greenplum is standing on the shoulders of others (like eBay) to peer over barriers that currently stand in the way of fast and flexible on-premise deployments. The far-reaching self-service promise of the announcement is just beyond the vendor's grasp. What remains to be seen is whether fleet-footed rivals will match or surpass what Greenplum is promising by the time it gets there.
About the Author
You May Also Like