Oracle Exadata: Still an Unknown QuantityOracle Exadata: Still an Unknown Quantity
Oracle's answer for large-scale data warehousing looks promising. But as this in-depth review reveals, production deployments and clearer insight on cost and administrative overhead are still wanting.
Curt Monash |
Introduced in September 2008, Oracle Exadata is Oracle's first major offering in fully parallel data warehousing. As such, it fixes the biggest gap in Oracle's data warehouse product line. Indeed, Exadata is something of an architectural leapfrog, establishing a "storage tier" separate from the "database tier" in a way no other commercial product previously has. Combined with the vast array of features already present in the Oracle database management system (DBMS) -- including strong capabilities to handle high-concurrency workloads -- Exadata has the potential to become the most capable data warehouse offering on the market. And of course it offers the ultimate in "Oracle compatibility," which alternative vendors generally lack.
PROS
Feature-rich
Oracle compatible
Architecture looks effective (in theory)
CONS
Architectural tradeoffs and inherent complexity
Lack of proven reference implementations in the field
High price
Data warehouse DBMS architecture is all about tradeoffs, however, and so Oracle still lacks features other vendors do offer. This deficiency is most obvious when compared to column-oriented DBMSs; if your needs are better met by columnar than row-based data warehousing, Oracle Exadata probably isn't for you. But other row-based vendors -- Teradata and upstarts alike -- also sport features that wouldn't make sense in the Exadata design.
What's more, any favorable view of Exadata assumes that it actually works more or less as advertised. This supposition is not yet known to be the case -- at least not widely. Although beta customers have been named, Oracle hasn't provided any true production references. Oracle is even reluctant to run Exadata proofs of concept at customer sites, preferring instead to do tests at its own facilities. Oracle's demos and other released information seem sufficient to establish that Exadata can provide a significant speed-up over Exadata-less Oracle DBMS deployments. Less clear is how Oracle Exadata compares with non-Oracle alternatives.
The last unknown is price -- and, more generally, total cost of ownership (TCO). Oracle's all-in list pricing is in the $50-$60 thousand/terabyte range using slower disks and in the $110-$130 thousand/terabyte range using faster ones, depending on system size. That is at the high end of the market. These ranges make sense, given Oracle's traditional software pricing posture and the fact that it isn't a leader on hardware/cost-saving metrics in areas such as raw analytic query performance or data compression. (For more detailed pricing information, visit "Oracle Database Machine and Exadata Pricing" and this list of posts on competitor pricing.) But realistically, these fees are all just first approximations, as each enterprise's negotiating situation is different.
Exadata is designed for data warehouses in the multi-terabyte range (user data). The smallest configuration you can buy is rated at 6 terabytes with two storage cells (the minimum allowed). The two most standard configurations have 14 storage cells each and are rated at 42 or 92 terabytes, depending on the type of disk used. Note that all such figures are approximate, because the user data to disk ratio is heavily affected by the nature of the raw data. The Problem Exadata Solves
Oracle is the unquestioned volume sales leader in DBMSs for enterprise-scale online transaction processing (OLTP). Viable competitors are usually perceived as offering something that, while not better than Oracle's offerings, is "good enough," while boasting advantages in price or other aspects of TCO.
The story is quite different in data warehousing. There, Teradata is the high-end vendor, measured by product perception and very large database (VLDB) market share alike. DB2 has enjoyed major technical advantages vs. Oracle, specifically in the area of parallel processing. And a number of startups -- Netezza, Vertica, Greenplum, Aster Data and others -- have scored impressive customer wins as well. Thus, even though Oracle is used in thousands of data marts and data warehouses overall, Exadata is still an effort to play catch-up to -- or perhaps leapfrog -- the high-end data warehouse competition.
Almost all of Oracle's relational data warehousing competitors share a major architectural feature Oracle has long lacked: fully parallel access to disk. (Sybase IQ is an exception, as is Microsoft SQL Server, though the latter's Project Madison is slated to change that in 2010.) In competing systems, different processors can simultaneously read from different disks, allowing large amounts of data to be scanned and queried in a short time. But the Oracle DBMS, based on design choices optimized for OLTP, views all disks as one. (This approach is commonly known as the "shared-everything" vs. "shared-nothing" architecture, although hard-working software marketers are increasingly succeeding in blurring the meaning of those phrases.) As a result, Oracle faces bottlenecks on I/O-intensive analytic queries, and raw performance and price/performance have suffered accordingly. Even when Oracle does perform acceptably, the necessary database design and administration workarounds can be labor-intensive, especially as database sizes grow beyond the 5- to 10-terabyte range.
The Exadata Solution
Oracle has not entirely switched from a shared-everything to a shared-nothing architecture. (By contrast, IBM long ago introduced a shared-nothing version of DB2, and Microsoft has recently embarked on a different approach in Project Madison.) Instead, Oracle introduced Exadata to try to have the best of both worlds. Exadata features two different independent pools of CPUs, each in its own subsystem. The CPUs in the HP Oracle Exadata Database Machine run shared-everything Oracle in the usual way. The CPUs in the HP Oracle Exadata Storage Server talk to disks in a parallel, shared-nothing manner, then forward some of what they read over to the main Oracle database. In theory, the data reduction performed on the Exadata Storage Server is sufficient to overcome the traditional shared-everything bottleneck and Infiniband networking within the device averts the creation of new bottlenecks to take its place.
Thus, Oracle's DBMS has spread from one tier of the computing architecture (database) to two (database and storage). Both tiers run on prepackaged configurations of HP Proliant servers with Infiniband networking. The main customer hardware choice is between bigger/slower SATA drives and smaller/faster SAS drives. Notwithstanding the presence of "HP" in both product names, Oracle says that the storage tier is actually more locked in to HP hardware at this time than to the database tier. This approach makes sense, as the database tier basically runs software Oracle has offered all along, while what runs on the storage tier is largely new and was developed in cooperation with HP's engineers.
In theory, the distinction between database tier and storage tier lets Oracle combine the advantages of traditional "shared-nothing" and "shared-everything" architectures. Data comes off disk and is whittled down -- via SELECTs, PROJECTs and certain kinds of JOINs -- all in parallel, while it's still on the storage tier. The intermediate results are then sent to the database tier for further analytic processing.
Oracle has released test results and completed demos that seem to show fast performance in practice, especially when compared with non-Exadata Oracle alternatives. Less clear is how Exadata performs in production environments (or even in less-controlled testing). Also unclear is how much ongoing administration will be required. On the one hand, removing the disk bottleneck should reduce the need for complex partitioning policies, advanced indexes and other DBA time sinks. On the other hand, Exadata does include a core DBMS, an add-on parallelization system (Oracle Real Application Clusters) and a second parallelization system (Exadata Storage Server). So simplicity is not exactly Exadata's watchword. Query Acceleration Features
Whatever you think of Oracle's fundamental architecture, there's no denying it offers an extensive list of data warehousing features. Although these resources aren't Exadata-specific, they need to be considered when judging the overall Oracle Exadata offering. Naturally, many of these features are focused on making analytic queries run fast. Two of the most important are:
Materialized views. Oracle can precalculate and maintain query results. The obvious benefit is that queries using those result sets can be greatly accelerated. The obvious drawbacks are CPU overhead, disk space and DBA burdens.
Specialized indexes. Much the same can be said about specialized analytic indexes, such as bitmaps or stars. Star indexes, in particular, are a lot like materialized views -- they consume resources upfront to do something a lot like query processing, so when actual queries come in, they can be executed faster.
Oracle offers a variety of static and dynamic SQL-tuning capabilities as well. Other vendors have similar capabilities, but they aren't as sophisticated. This difference is partly a matter of product maturity. But there's also a raging debate as to whether these features are useful in the first place, or whether simpler alternatives are superior. Rather than a collection of tools to do joins in clever ways, other systems take a more brute-force approach -- scan lots of data off of disk, process it in straightforward ways, and optimize that processing to the max. Customer experience suggests that both sides of the debate have strong points.
Analytic Processing Features
Oracle also offers a huge list of features that go beyond simple SQL, including full SQL 2003 analytics and a lot of data mining. For example, SPSS reports that Oracle's in-database data mining scoring offers significant performance advantages. The jury is still out, however, on whether Oracle's ability to also build data mining models is of value to many customers. Oracle's parallelization scheme for analytics is at once more flexible and more cumbersome than those sported by most new competitive vendors.
Security
This capability is a constant concern for database owners. In the case of data warehousing, security often revolves around privacy -- especially when what's being analyzed in the warehouse is customer-specific data. Oracle has invested for some time in features such as fine-grained access control and transparent encryption. For those companies with privacy and security concerns, these options are Oracle and Oracle Exadata strengths.
The Bottom Line
Oracle has vast database expertise. Oracle Exadata has a sensible architecture, extending a product that's used in a huge number and variety of data warehouses today. There's little doubt that Exadata will eventually meet a large range of users' data warehousing needs. Less clear, however, is whether Exadata's license price, complexity and legacy architectural choices will keep it from having an attractive total cost of ownership. And the dearth of customer proof points along with Oracle's troubling resistance to perform on-site proof-of-concept testing raises doubts as to whether Exadata is ready to meet any enterprise's needs today.
If you're an Oracle user building data warehouses in the 5-plus terabyte range, Exadata should absolutely be on your technology radar. But unless you have a relationship with the company that includes generous price discounts or are a committed Oracle loyalist, other vendors' less costly and more proven alternatives may prove superior for your needs.
Curt Monash runs Monash Research, which provides strategic, analysis-based advice to users and vendors of advanced information technology. He also writes the blogs DBMS2, Text Technologies, and Strategic Messaging. Write him at [email protected]
About the Author
You May Also Like