Fables And Facts 2Fables And Facts 2

Do you know the difference between dimensional modeling truth and fiction?

information Staff, Contributor

October 7, 2004

3 Min Read
information logo in a gray background | information

Integration is the Goal, Not Normalization

Some people believe normalization solves the data integration challenge. Normalizing data contributes nothing to integration, except forcing data analysts to confront the inconsistencies across data sources.

Data integration is a process apart from any specific modeling approach. It requires identifying incompatible labels and measures used by the organization, then reaching consensus to establish and administer common labels and measures enterprise-wide. In dimensional modeling, these labels and measures reside in conformed dimensions and conformed facts, respectively. As represented in the bus architecture, conformed dimensions are the integration "glue" across measurement business processes. Conformed dimensions are typically built and maintained as centralized persistent master data during ETL, then reused across dimensional models to enable data integration and ensure semantic consistency.

FABLE: Dimensional modeling concepts like conformed dimensions put an undue burden on the ETL effort.

FACT: Data integration depends on standardized labels, values, and definitions. It's hard work to reach organizational consensus and implement the corresponding ETL system rules, but you can't dodge the effort, regardless of whether you're dealing with a normalized or dimensional model.

FABLE: Dimensional modeling isn't appropriate when there are more than two unique source systems due to the complexities of integrating data from multiple sources.

FACT: The challenges of data integration have nothing to do with the modeling approach. Paradoxically, dimensional modeling and the bus architecture reveal the labels and measures of a business so clearly that an organization has no choice but address the integration problems directly.

FABLE: Changes to dimension attributes are only an issue for dimensional models.

FACT: Every data warehouse must deal with time variance. When the characteristic of an entity like customer or product changes, we need a systematic approach for recording the change. Dimensional modeling uses a standard technique known as slowly changing dimensions (SCDs). When normalized models step up to the issue of time variance, they typically add timestamps to the entities. These timestamps serve to capture every entity change (just like a type 2 SCD does), but without using a surrogate key for each new row, the query interface must issue a double-barreled join that constrains both the natural key and timestamp between every pair of joined tables, putting an unnecessary, unfriendly burden on every reporting application or query.

FABLE: Multiple data marts can't be integrated. They're built bottoms up, catering to the needs of an individual, not the needs of an enterprise. Data mart chaos is the inevitable outcome.

FACT: It's definitely a struggle to integrate data marts that have been built as departmental, standalone solutions that haven't been architected with conformed dimensions. That's precisely why we advise against this approach! Chaos won't result if you use the bus architecture for the enterprise framework of conformed dimensions, then tackle incremental development based on business measurement processes. Organizational and cultural obstacles are inevitable as consistent definitions, business rules, and practices are established across the enterprise. The technology is the easy part.

Learn more

This column clarifies the realities surrounding common dimensional modeling myths. As some of you know, the Kimball Design Tips newsletter discusses a new fable each month. If you'd like to receive a monthly tip and technique from the Kimball Group, please visit www.kimballgroup.com to register. The newsletter's sole purpose is to deliver practical insights and news about Kimball University classes to DW/BI professionals. The names and email addresses on the list will not be shared with anyone.

Ralph Kimball founder of the Kimball Group, teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. He has four best-selling data warehousing books in print, including the newly released The Data Warehouse ETL Toolkit (Wiley, 2004).

Margy Ross is president of the Kimball Group and instructor with Kimball University. She cowrote The Data Warehouse Lifecycle Toolkit (Wiley, 1998) and The Data Warehouse Toolkit, 2nd Edition.

Read more about:

20042004
Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights