When Worlds CollideWhen Worlds Collide

To meet compliance requirements, gain a sharper view of customers and achieve process excellence, look to new integration technology that can merge separate data resources.

information Staff, Contributor

December 23, 2004

12 Min Read
information logo in a gray background | information

Fear not, fellow earthlings. Although the worlds of structured data and unstructured content are indeed on a collision course, they're drawn together by the forces of good: the urgent information demands of business executives, collaborative partners, customers, regulators and knowledge workers. Energized by open Web protocols and networked computer systems far more powerful and ubiquitous than the "differential analyzer" featured in the 1951 science fiction movie classic, When Worlds Collide, the meeting of structured and unstructured information promises to explode into a new phantasm of universal connectivity. The result: better customer knowledge and interaction, smarter business processes, more comprehensive fraud detection and compliance with the Sarbanes-Oxley Act and other regulations.

With XML gaining acceptance as the set of rules for universal data description, the role of information management must change. No longer will it be acceptable to tend to data or content stores and allow access only through special applications. Managers must judge their success by how well information flows through the organization and contributes to achieving its objectives.

Executive Summary

Regulatory compliance and the competitive promise of fuller insight into customer relationships and critical processes are raising demand for a convergence of structured and unstructured ("content") information management. Many enterprises are stymied by silos of information that can only be accessed, analyzed and shared through proprietary code and data transformations. However, with Web services and the growing acceptance of eXtensible Markup Language (XML) as the universal data description framework, new choices are at hand for gaining higher return on investment from all information resources. Search engines and text mining can lower the cost of finding value in content, which will create opportunities for new kinds of strategic business applications. Integration middleware is essential to the convergence of disparate data resources. Organizations should pay close attention to the emergence of software for enterprise information integration, which attempts to give users a single view of all relevant data. EII could prove important as service-oriented architecture and business process management implementations reach for efficiency at a reasonable cost.

As universal data description breaks down barriers, suddenly every piece of information holds potential, whether it's transaction data, e-mail, call records, text documents, images, spreadsheets or video and audio files. Alas, as news of this exciting truth spreads, so too do challenges facing IT professionals. Will relational databases succeed in becoming universal information management engines? What happens to content management? Can enterprise information integration (EII) be the middleware that connects disparate data sources together? And what role will text mining, search and other technologies for accessing and analyzing unstructured information play in enhancing business intelligence (BI)?

This article and its companion piece ("Convergence Up Close") examine the business and technology trends guiding the convergence of structured and unstructured information management.

Faster, Better, Cheaper

Today, faster means driving out latency; businesses want to gain competitive advantage by, for example, reducing the time that inventory sits in one place or the time that it takes from a customer's order to fulfillment. Other organizations, especially those employing call centers, want to decrease the amount of time a sales representative has to work with a potential customer to close a sale.

IT must respond by similarly removing latency from the information flow, no matter what type of data or content, or where it's stored. IT must help organizations use information to synchronize the activities of different internal and partner constituencies. Opening access and ensuring high availability to more data is also crucial for emerging event-driven architectures, which are intended to automatically alert decision makers about material events occurring in business processes as soon as they happen.

As organizations approach the goal of real-time information, marketing, customer service, corporate compliance, product development and other key operational functions must become better at decision-making. For about two decades, the engine of business decision-making has been structured, historical data held in relational databases, spreadsheets and their precursors. However, with more and more vital steps in customer, partner and executive decision-making happening collaboratively, the reach of BI and supporting data warehouses isn't enough.

For example, for a financial institution to gain a single view of customers to determine cross-sell and up-sell potential, or to give customers personalized acces to all their accounts, IT must bring together all relevant information for each business function. Along with a consolidated view of information, decision makers need more than just standard BI query and reporting to access and analyze the data. They need search engines, classification and text mining technologies that have grown up in the unstructured realm.

Finally, there's cheaper. At office supply and services provider Corporate Express, the savings gained by bringing the business online, with the aid of an EMC Documentum content management system, were multiplied by an information flow from the resulting customer portal to downstream BI and other applications. "The portal takes the job of updating the site out of the hands of developers and puts it in the hands of business users," says Wayne Aiello, vice president of eBusiness Services for Corporate Express. "Business users can do updates in real time, using any format they feel will be useful to customers."

Hoping to drive down IT costs and improve flexibility, many organizations have declared service-oriented architecture (SOA) as the direction for all new development. Closed information silos present obstacles to SOA — obstacles that cost money because developers have to resort to older methods of creating custom code and data transformation routines that aren't reusable. To achieve the business goals of SOA — better total cost of ownership, increased agility and improved focus on what delivers competitive edge — universal information integration is a critical IT objective.

Portal Power and Limitations

A portal, serving as the Web-based nexus for content delivery and collaboration, is often the first place where structured and unstructured worlds converge. Rather than focus on managing information for its own sake, portals are about using information to improve the business by providing a richer employee, partner or customer experience. The ultimate vision of the portal goes beyond the simple door to a virtual content warehouse to include workflow, guided analytics and other features to enhance the human role in business processes.

However, what a portal can't do is hide poor information integration. While some companies have tried using the portal to integrate data and applications on screen, the effort can end up as a costly, one-off, nonextensible system.

At the other end of the spectrum is simply enabling portal access to Web services and hosted applications and databases, where the more complicated integration happens behind the scenes. Sabre Airline Solutions, for example, employs Sun Microsystems portal and server technology to act as an application service provider to more than 200 airlines. Along with current decision-support facilities, "Melding content management in our eMergo portal is where we're going because airlines have a lot of documentation they need to control and access," says David Endicott, VP of Sabre Airline Solutions product development.

The portal is also where most BI vendors display their convergence with the world of content. Business Objects, for example, resells Autonomy's technology as part of its InfoView portal solution. To support Web services providers, BI vendors such as MicroStrategy have built XML into their latest releases. However, partnerships with middleware providers is primarily how XML still enters the BI realm.

Universal Database Ambitions

Before XML was even a gleam in the eye of the World Wide Web Consortium (W3C), relational database vendors were pursuing the dream of the "universal" database. In the late 1990s, major vendors adopted parts of the SQL99 standard and built extenders into their engines, which enabled the products to manage and access unstructured data.

Recently, IBM, Microsoft and Oracle have been closing the gaps between their extended relational engines and content and collaboration management systems. Last month, Oracle introduced Oracle Files 10g, which puts content management on top of Oracle's database and application server systems. Microsoft has indicated that SQL Server will serve as the engine driving its content management, text search and XML data management as the company's Yukon release unfolds.

IBM was the first of the three major vendors to offer a relational embrace of content management, putting Content Manager in the DB2 Universal Database portfolio nearly two years ago. In 2004, the company further embraced data and content convergence by acquiring Venetica, which offered VeniceBridge, an SOA approach to accessing and integrating content and workflow processes.

Clearly, the relational database vendors see huge opportunities as the need to access, integrate and manage various forms of content becomes more acute. "We have a client building a store of content and structured data in which the content is five times larger than the Library of Congress," says Richard Winter, president of Winter Corp. and a specialist in very large databases. "Over the next few years, we're going to see a substantial number of these mixed stores with hundreds of terabytes — if not petabytes — of data and content."

The Content Catapult

EMC's acquisition of Documentum began what has been a rapid pruning of the enterprise content management market. The Documentum move, along with EMC's acquisitions of Legato, VMWare and AskOnce, a unit of Xerox, moved EMC to the forefront of content management and gave substance to the vendor's quest to offer an information life cycle management (ILM) strategy. The AskOnce technology is the basis for EMC's "virtual repository solution," which proposes a unified management hub for content, business process management and integration.

Corporate Express, a Documentum customer since 1996, sees value in ILM. "We probably have 30 terabytes of storage right now, and not just with EMC products," the company's Aiello says. "We need to see how we can best manage content not only on production systems but off them onto cheaper storage."

Neither tight IT budgets nor hegemonic database vendors can blot out innovative new companies in the content management and analysis area, however. Two to watch are Attensity and Mark Logic. Attensity's line of products can read, "relationalize" and integrate structured and unstructured data. Whirlpool uses its Relational Extraction Server to draw information from, and develop insight about, warranty claims, customer feedback and service records — a process that previously took so long that the company couldn't diagnose problems until months after shipping thousands of products.

Mark Logic's Content Interaction Server treats documents as databases. XML is central to the product, as is the emerging W3C XQuery standard, which Mark Logic employs in its processing engine. Essentially, the company is trying to repeat the history of relational databases and SQL, but with content and XML. Rather than "federating" the content access problem through interaction with only metadata, Mark Logic centralizes the storage and indexing of content.

Integration Rules

Even as relational database vendors developed universal extensions, interest in integration middleware alternatives grew. Now, with the Web and XML going mainstream, information integration is, well, universally accepted. IBM's DB2 II, for example, acts as a front end over all kinds of data resources. While IBM's strategy includes a variety of alternatives, DB2 II is famous for taking a federated approach, in which you don't try to replicate data to a central store but instead send queries out to the sources to gain answers.

The federated approach is essential to EII, the technology that is right now roiling the data warehousing industry. EII approaches from Composite Software, Avaki and other vendors optimize access to heterogeneous data sources and deliver a single, comprehensive view of a customer, for example. Fiery debates aside, EII will never replace centralized data warehousing; both approaches have merits and will have to coexist. EII could be how content becomes part of the information infrastructure, especially given the size of such resources.

Are we witnessing the dawn of a new world, where information flows aren't determined by data type, application silo or the schema of the squeakiest wheel? Where universal connectivity and information integration technologies respect all data resources — including legacy source — for what they are? And where service call centers know us better than we know ourselves?

Those of us who've seen enough silver bullets fly by know that perfection is never achieved. EII and other new technologies have many performance, reliability and security hurdles to overcome, especially when dealing with multiple data types. Still, times are changing; SOA, open XML standards and the business drivers that demand their success will not be denied. Traditional boundaries must be scrutinized. Formerly separate worlds are becoming one.

DOSSIER

Convergence of Structured and Unstructured Information

The Brief»Organizations are under pressure to comply with regulations and better leverage all data resources. New integration, search and text mining technologies are creating unprecedented opportunities. It's time to reconsider outdated information flows set by rigid data type definitions.

Options»Employ XML and other Web standards. Acceptance of XML can clarify how to develop metadata and other facilities that enable applications to work with heterogeneous data resources. »Treat content as part of your "extended" information management strategy. Relational database providers want to manage your content; new technologies take a fresh approach; and existing content management vendors have expanded their vision. »Test EII middleware. Enterprise information integration (EII) reduces data movement and can employ metadata to establish single views of heterogeneous data. »Expand BI to include unstructured analysis and reporting. Search engines and text mining widen the range of options for gaining actionable insight.

Influencers»Existing systems and skill sets hamstring IT innovation. To break with the past, business executives must champion cross-functional development. »Convergent technologies are immature. EII and other middleware integration alternatives have yet to prove their performance, reliability and availability mettle. »Service-oriented architecture (SOA) alters software buying plans. Emerging Web services continue to change the software landscape for integration projects.

Action Items»Gain business support for integrating structured data and content. Compliance may demand it, but so do key business objectives. »Identify pilot projects for evaluating new technology. Applications that demand broader customer insight could be ripe for innovation.

Read more about:

20042004
Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights