Content: The Other Half of the Integration ProblemContent: The Other Half of the Integration Problem

Counting file systems, e-mail servers and disparate repositories, unstructured information is all over the place. Content integration consolidates search, access and management control, but which approach is best for your enterprise?

Bruce Silver, Contributor

September 21, 2005

12 Min Read
information logo in a gray background | information

All business integration initiatives, whether involving applications, processes or information, are driven by the same idea: It's too difficult to replace legacy systems with a single, enterprisewide standard, even if those systems have become information silos that impede efficiency, agility and regulatory compliance. So why not leave them in place and use middleware to make them appear as a seamless whole?

Content integration is just one manifestation of this idea. Content management (CM) is increasingly viewed as a cross-functional, enterprise-scale challenge. Regulatory compliance, business processes, customer service and real-time information delivery all demand unified access to and management of content across the entire organization, including everything from Office documents and e-mail to scanned documents, Web content, structured reports and graphics.

But a multitude of incompatible CM islands are scattered throughout the enterprise. The typical enterprise has at least three content repositories, and 40% have six or more, according to Forrester Research. Even customers of a single enterprise content management (ECM) vendor may have separate repositories for each content type — for example, document images, revisable documents and Web content. Mergers and acquisitions only compound the problem. Meanwhile, most enterprise content remains entirely unmanaged: Stored in network file systems and on Web sites, it's a compliance risk as well as buried corporate knowledge.

Content can be integrated with middleware that provides a common interface to query, version or index content in any repository from client applications. This article examines the business needs driving content integration and the trade-offs of different approaches. Even if you decide you must consolidate on an enterprisewide platform, integration may offer an affordable quick fix for your current content access and management problems.

What Drives Content Integration?

ECM vendors would love to sell you a gigantic new enterprise repository and migrate all your legacy content to it, but they know that's unlikely. Migration costs can overwhelm the potential savings on software licenses, maintenance, training and administration. And besides, you may have business applications and workflows that depend on specific functions of legacy CM systems missing in the new enterprise repository.

Enterprise information portals have been called a form of content integration because they aggregate information on the user's screen, but true integration involves managing content stored in multiple, diverse repositories as if it were stored in a single repository — not just search and retrieve content, but edit, approve and secure it.

Middleware-based integration projects are being driven by a variety of business needs:

The content silo problem. CM historically has been deployed departmentally, so you could easily have FileNet in one department, EMC Documentum in another and IBM, Open Text or other vendor or home-grown systems in other corners of the enterprise. Mergers and acquisitions only multiply the problem. To maintain customer service, regulatory compliance and employee productivity, various enterprise applications, business processes and records management solutions must integrate with an ever-changing list of content repositories. The conventional approach — point-to-point integration via custom code — is expensive and time-consuming. By providing a consistent middleware layer, content integration helps simplify dynamic environments.

The intrasuite integration problem. Even if your company has standardized on one ECM vendor, don't assume the entire suite behaves as a single repository. Many ECM suites have separate repositories for each content type, and legacy versions of the suite may be isolated from newer ones. Integration helps here, too. For instance, FileNet's P8 Content Federation Services integration middleware bridges the separate FileNet repositories for production imaging and revisable content. It also connects P8 with older FileNet Panagon systems, as well as third-party repositories. Thus, legacy content need not be physically migrated to leverage a new workflow or compliance solution.

The unmanaged content problem. In most organizations, content is simply stored in network file systems, which makes it a compliance and discovery risk. Integration lets you convert file properties to metadata and apply library services — check-in/check-out, versioning and access control — without moving content to an ECM repository. Once integrated, it can be included in the same enterprisewide queries, compliance policies and workflows that encompass managed content.

The managed migration problem. Migration from legacy repositories makes sense as a long-term strategy. By consolidating content into a single repository (or at least fewer repositories), you reduce license and maintenance costs, improve security and scalability, and cut down on the number of separate vendor negotiations. But migrations often must be managed gradually, since certain departments may depend on customized features of the legacy platform unavailable in the new one. Content integration supports a managed migration strategy. For example, McDonald's cut costs by migrating mountains of real estate documents from FileNet to Day Software's repository, but the legal department needed to stay on FileNet. With Day's built-in content integration software, compliance processes were able to access both repositories as if they were one.

To solve all these problems, applications don't communicate directly to the repositories but access them indirectly via integration middleware. The diagram on page 34 shows the functional layers of content-integration technology.

Connect, Map the Metadata

Content integration provides a content bus, a single API that shields developers from the connectivity details of various repositories. Like other forms of business integration middleware, content integration relies on connectors: software modules that translate between the common programming interface of the integration hub and the specific APIs of each repository. Connectors are typically available for leading ECM servers as well as databases and file systems. Vendors also provide tools so developers can build their own connectors when necessary. In addition, content integration usually provides single sign-on and session management to multiple repositories, referred to as federated access, so accessing multiple content stores appears to the user as connecting to a single repository.

Within the Java world, a new standard promises to simplify ECM connectors and perhaps eliminate them in some situations. JSR 170 specifies a standard Java API to access content repositories independent of vendor implementation. Level 1 defines read-only functions, including search, retrieval and export to XML, supporting presentation templates and portal applications. Level 2 adds writing to the repository and import from XML, as well as optional features such as versioning, check-in/check-out, access control and event notification. The final release of JSR 170 was published in June, and so far Day Software, which led development of the spec, is the first to offer a commercial implementation. As ECM vendors make their repositories JSR 170-compliant, applications will be able to access all basic functions through a single API, without connectors.

While JSR 170 may standardize the API for indexing and search, it doesn't standardize metadata. The crux of the content integration problem is that corresponding metadata elements in each repository have different field names and formats. AccountID in one repository might correspond to CustNum in another.

At a minimum, you must be able to translate between them, to query for content and aggregate search results. Content integration provides tools for metadata mapping and, in some cases, its own metadata dictionary and schema. To incorporate content in file systems and Web sites, some products offer rules-based metadata extraction (auto-tagging) from the content itself.

With metadata mappings defined, content integration lets you search across repositories in a single query. Usually this is accomplished by query brokering--translating the query from a "universal" language into each repository's search parameters and interrogating each source in parallel. This is trickier than it sounds: Not all repositories may support combined text-and-metadata queries, fuzzy searching and other middleware query language capabilities. EMC Documentum ECI Services, for example, provides mapping and filtering rules that translate queries into "nearest equivalents" supported by each specific repository. Some search engines offer a similar form of federated search, but content integration lets you manage the content, not just find it.

Note that metadata mapping and filtering must be provided not just for the query, but also for the results so they can be aggregated in tables or simply listed in a common format. Query brokering leverages the indexes of the repositories themselves, which are always up to date with the latest content.

Another way to perform federated search is to maintain a universal index within the content-integration middleware. This index is periodically updated by crawling the indexes (or raw content) of each repository. Google users are familiar with this approach. Because the processing is done in advance, queries and result list aggregation are faster, but the index may not include the newest content.

A few ECM vendors support both methods. IBM's WebSphere Information Integrator Content Edition, for example, uses query brokering, while WebSphere Information Integrator OmniFind Edition provides its own universal index.

Aggregate Views and Capture Events

Many content-integration offerings provide a virtual repository: a tree of virtual folders in which content items are aggregated from various repositories. The virtual repository doesn't replicate content and metadata; it merely provides an aggregated view. Virtual folders can be defined to support specific business processes, projects or activities. The folders can represent content queries or workflow inboxes. Moreover, they are dynamic, automatically updated when the content in the underlying repositories changes.

Because access to the virtual repository is usually over the Web, view services become a valuable content-integration feature. These include Web viewers for common content types, such as Microsoft Office, and on-the-fly converters for document image formats such as TIFF and MO:DCA files.

When content is added or updated or a deadline is reached, leading ECM repositories generate events that can be used to trigger actions based on business rules. For example, a mortgage loan application approval can trigger the printing and mailing of letters of congratulations as well as separate cross-selling processes for homeowner insurance.

Content integration extends this idea to the virtual repository. You can define rules that determine whether an event has occurred in any connected repository and then specify how these events will be handled. Event-triggered actions can be invoked on any content repository or workflow system connected via content-integration middleware. In some offerings, such as IBM WebSphere Information Integrator Content Edition, content events can even tie directly into the enterprise business-integration infrastructure, which can invoke Web services, J2EE components or JCA connectors to external business systems.

Reality Check

While content integration is powerful technology, it's still in the early adopter stage. Initial deployments of IBM WebSphere Information Integrator Content Edition have been led by federal government intelligence applications, but the vendor sees CRM as the next wave. Content integration can aggregate contracts and other documents with transaction data in a single SQL query, providing a "360-degree view" of the customer. Another promising avenue for content integration is compliance and records retention. A new IBM offering called Federated Records Management uses content integration to link multiple content stores with DB2 Records Manager under a classification and retention policy.

FileNet implementations have emphasized the imaging/workflow side. Customers use FileNet to extend the life of legacy image repositories by integrating them with newer business process management and compliance platforms.

Mobius Management Systems provides content integration in its ViewDirect Total Content Integration and ViewDirect Records Management offerings. Like FileNet, Mobius emphasizes fixed content such as host reports and document images, but the company offers connectors for Microsoft SharePoint, relational databases and other ECM repositories. FileNet and Mobius also emphasize the need to extend the records management, retention and compliance controls to all organizational content.

Content integration is EMC Documentum's fastest-growing product this year. Interest is strongest on the knowledge management and discovery side, with the common problem being that customers simply have too many places to search for what they need.

Day Software places content integration in the context of enabling managed migration from proprietary ECM to lower-cost, standards-based repositories. With the first JSR 170-compliant repository and ECI services built into the platform, Day wants to be the beneficiary of that migration.

Integration vs. Migration

Day's strategy reveals the changing economics of CM technology. Legacy content repositories, especially those designed for high-performance imaging and workflow, cost more to license and maintain than newer, generic content repositories. Integration expenditures — including middleware, connectors and related services — average $300,000, according to Forrester Research. Migration costs include the analysis, mapping and movement of content (and related services), plus new licenses less the difference in the maintenance costs between old and new repositories. As York International discovered, moving even a small amount of content can lead to unanticipated security and access problems (see the "Field Report"). If maintenance costs for the legacy repository are large enough, migration can make economic sense, but integration can solve the big problems quickly and let you migrate over time.

Wachovia provides a good example of how content integration lets enterprises manage the cost and challenges of repository reorganization. In early 2003, the banking and financial services company needed to either migrate or integrate diverse content repositories in several lines of business due in part to mergers and acquisitions. Each business had funded its own IT initiatives, but rather than replicate point-to-point integration projects, Wachovia used integration middleware. If Commercial Loans would fund development of the integration infrastructure, central IT would pay for its operation, confident that other lines of business would chip in their own funding as their own repositories needed to be integrated. Retail Brokerage soon joined the project, and other units now plan to follow. In first integrations completed before the end of the year, Wachovia's individual lines of business were able to integrate and migrate incrementally, affordably and consistently.

Draw Interest

Content integration is now dominated by the big ECM vendors, largely based on recent acquisitions. In the past year, IBM acquired Venetica, a supplier of middleware to several ECM vendors, and turned that software into the WebSphere Information Integrator Content Edition. EMC bought askOnce from Xerox, turning it into Documentum ECI Services. Oracle, which in August announced a major upgrade of its Collaboration Suite with ECM-oriented content, record and workflow services, acquired (also in August) ContextMedia, one of the few remaining independent content-integration vendors.

As we've seen before with electronic records management and team collaboration, when ECM giants snap up boutique technology startups, market awareness and demand for the new technology spike. While you may not have heard much about content integration yet, chances are you'll hear a lot more in the coming year.

Bruce Silver is president of Bruce Silver Associates (www.brsilver.com). Write to him at [email protected].

Read more about:

20052005

About the Author

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights