How To Make BI Less of a GambleHow To Make BI Less of a Gamble
To avoid expensive BI debacles, use rules-based audits and proofs of concept to understand potential project pitfalls
Successful business intelligence and data warehouse projects share at least one common characteristic: explicit consideration of risk. And nothing — not a detailed project plan, expensive technology or high-priced BI talent — addresses risks as well as a rules-based audit (RBA) or proof of concept (POC) test.
PAIN POINTS |
---|
• Data quality unknowns make BI projects risky. Dashboards, reports and analytic applications are only as good as the data they contain. Many projects, begun with high hopes and executive backing, founder in a muck of dirty data and go well over budget to cleanse and correct it. While data quality, profiling and other tools may be essential as the project gets under way, BI project managers would be wise to assess early the risk that data quality poses. This will give everyone a realistic picture before budgets are settled-and the ROI clock starts ticking.• Top-drawer BI teams cost a bundle. Do you really want to point such specialists at discovering that you have bad data and figuring out how to fix it? Consider assessing risk with less expensive, more easily available means. Save the big bucks for the proven plan.• Assessing risk to BI projects is often a one-off operation. Determining the quality of source data should become a routine part of BI project management. Business rules can help sort out the factors that need to be assessed; automating rules will save steps and money. Developing rules will also clarify business expectations about data-and what the organization is willing to spend to add missing data and correct inconsistencies. |
BI projects are peppered with risks, from data quality problems to lower-than-expected analytic value. These dangers often bring entire projects to a halt, leaving planners scrambling for cover, sponsors looking for remedies and budgets decimated. A project I worked on a few years ago convinced me of the value of risk mitigation. The company had 20 disparate sales applications around the world, leaving executives unable to report current sales without estimating. The goal was to accurately report current sales as well as chronological history of sales order line detail changes.
The company hired one of the big-six consulting firms to create a single sales data mart on a Windows platform. After spending nearly $1 million and failing to achieve its goal, the company scuttled the project. The problem wasn't data volume or technology, it was data quality. As it turned out, a few of the sales applications restated history whenever a change was made. Consequently, accurately reporting all reversing entries and changes to every sales order line was impossible simply because the application didn't maintain that information. But the company didn't need to spend a million dollars to make this discovery. Consider the various approaches:
Approach One: Spend $1 million to bring in a high-priced BI team, conduct planning sessions to create and agree to an elaborate project plan, conduct business requirements gathering sessions, document all requirements in professional binders, build a fantastic entity-relationship model, gather and map source data to that model, purchase and install your platform and start writing transformation scripts. Go to all this trouble and expense before you discover that the source data can't be transformed into the required target table.
Approach Two: Take a laptop computer with sample source data, apply your business rules and, for less than $50,000, see if it's possible to create the target data table. Take this risk mitigation step before you commit to the full-scale project.
Risk mitigation is all about saving money, time and grief. You be the judge: Spend big bucks to find out you have a problem, or spend a modest sum to test your approach?
RISK MIGRATION TECHNIQUES COMPARED
Rules-based Audit | Proof of Concept | |
---|---|---|
Source Data | Sample data only | Sample of complete data set |
Platform | Conducted on an independent, isolated platform, such as a laptop computer | Conducted on an isolated platform or platform of choice for testing batch cycle times, network connections, CPU performance, elapsed time performance and other variables |
Testing Goal | Applying explicit business rules to sample source data in order to build target tables | Scaling the results of a business rules audit to assess production-level data volumes, processing time constraints, platform stress and other issues |
Take the Spiral Approach
To address BI project risks, I recommend using the spiral approach familiar to many software developers. Developed long ago by Barry Boehm, this approach — a key part of today's "agile" methodologies — organizes software development steps into a spiral that's divided into four quadrants:
Quadrant One: Determine objectives and constraints.
Quadrant Two: Analyze risks, alternative prototypes.
Quadrant Three: Oversee development.
Quadrant Four: Plan the next phase.
The spiral approach — especially Quadrant Two — is helpful because it explicitly addresses risk, whereas all other process models and software development methods are document driven. What's the difference? Document-driven approaches assume you can get complete, formal documentation. But to obtain clear, concise documentation, the solution must be clearly understood and defined prior to development, and anyone with experience knows this is seldom the case.
In our $1 million example, the project was based on a document-driven approach. The company developed very detailed, professional documents beforehand, yet it encountered data quality problems in development. Had it taken a risk-driven approach, data integration problems would have been identified in advance, and it could have considered alternative solutions.
BI projects invariably involve questions, issues and unknowns that must be addressed before you can be confident of success. Rules-based audit and proof of concept are risk-analysis techniques conducted in Quadrant Two of the spiral approach that provide answers, resolve problems and help you understand the scale and scope of the project at hand.
The RBA is designed to answer a single, fundamental question: Can you take known data sources, add explicit business rules and create the target data necessary for subsequent analysis? If you can't answer this question with confidence, then you have no business risking company resources on the project. The POC takes the results of the RBA and scales the testing to prove the viability of production issues such as actual data volumes, processing time constraints and platform stress testing. The table below compares RBA and POC techniques.
Spot the Risks Before You Invest
You can implement just one of these analyses, but you should use both to better understand and mitigate project risks. The RBA proves that you have the business rules and data quality required to create the target table, while the POC ensures that you can scale to the necessary data volumes on target production platforms.
Conducting an RBA/POC is a straightforward process, similar to conducting a data quality audit (see "The Data Quality Audit," July 10, 2004, at IntelligentEnterprise.com). The five-step process is:
1. Identify potential risks and select a risk analysis technique, such as RBA or POC, to determine the scale and scope of potential problems and shed light on possible alternatives.
2. Select your RBA/POC tools. The tools must be capable of applying a wide variety of explicit business rules and still be simple to install, modify and execute. Look for three characteristics:
The tools must be robust to apply a variety of explicit business rules to source data in order to create target tables. Rules include complex joins, sorting and filtering.
The tools must not interfere with the RBA/POC's objectives, so avoid complicated technologies that will require specialized skills or training.
The tools must install completely on a single laptop and scale to the production levels required. A RBA is always conducted on a laptop or stand-alone workstation using sample data, but a POC must be able to test full data volumes and target production platforms.
3. Gather source data definitions. This step determines the scope and attributes of the tables. For example, if you determine that you have 20 source tables of sales data with 10 attributes each, you'll have a clearer understanding of scope and insight into what referential data might be necessary for validation and information enhancement.
4. Run an initial RBA: This step relies on sample data to determine if you can build the target tables given known business rules and existing source data. Carried out on a laptop or stand-alone PC, this effort is completely isolated from technology issues and assessment. The RBA involves three substeps:
Apply known business rules. These explicit business rules must be applied to transform the source data into whatever target is necessary for subsequent analysis.
Create sample target. Once all rules are defined, you can attempt to build the target tables. The technology (platform, software and so on) used for these structures isn't a concern, only whether or not they can be built. Technology issues are addressed in the POC.
Test results. Assuming the target tables can be built, the final step is to test the results-meaning the data itself. Can you aggregate sales grouped by sales orders, products and reversing entries over the last quarter and get an accurate result?
5. Run a full-scale POC. Once you've proven that you can create target tables based on the RBA, scale up to identify production risks. Scaling up involves four substeps:
Choose production data sets. Testing current and future data volumes requires sufficient data.
Establish a testing environment. If you can't use the actual production environment for your POC, you must emulate it as accurately as possible. If production uses x amount of disk space and is assigned y processors and z memory, for example, then run your POC under similar conditions.
Create verifiable, repeatable metrics. You must be able to measure elapsed time performance and platform resources consumed, including CPU, memory, disk space and so on. This step is critical as justification ("proof") for management.
Synchronize POC target results with RBA results. Even though you're running a scaled-up test, the target data results must match your RBA results.
Putting It Together
Ignoring BI project risks is reckless and naive. You can't avoid risks entirely, but you can integrate risk-analysis techniques, such as RBAs and POCs, into the project itself. Together, these techniques give you a clear understanding of problems that lie ahead so you can consider alternatives or revise your strategy to ensure success. RBAs and POCs are not only effective, they're cheap when compared to full-scale production implementations.
Reducing risk does not have to be expensive. My company, HandsOn-BI (www.handson-bi.com), has developed its own Business Rules Engine (BRE) tool using Visual Basic and Excel. This homegrown tool is designed to run on a laptop, and it lets you conduct effective rules-based audits on sample data without worrying about technology compatibility.For larger, more complex audits and proofs of concept, we recommend DMExpress from Syncsort (www.syncsort.com). This software is scalable from a simple laptop to a 64-way Superdome. It provides a full range of ETL functionality without a significant learning curve for business analysts.Readers of Intelligent Enterprise can download workstation trial copies of BRE and DMExpress for a limited time atwww.handson-bi.com/html/bre4iedemo.php. BRE is then free to those who register at HandsOn-BI. The full workstation version of DMExpress costs about $2,000. A detailed white paper, “Conducting a Rules-Based Audit and Proof of Concept,” expands on the topics discussed in this article and is available for download at www.syncsort.com/25mgdx.— Michael L. Gonzales |
Michael L. Gonzales is president of The Focus Group Ltd. His books include Ibm Data Warehousing (Wiley, 2003). Write him at [email protected].
About the Author
You May Also Like