Search for MeaningSearch for Meaning

Beyond searching and hyperlinks, Web collaborations get to the heart of the matter: meaning.

Seth Grimes, Contributor

December 14, 2005

4 Min Read
information logo in a gray background | information

I've seen suggestions that Google Analytics will kill established business intelligence (BI) vendors, and that Google Base content hosting will accelerate a trend Craigslist started and snuff newspaper classified advertising for good. But these and other new services are not that threatening (or promising, depending on your point of view). Google's hope is chiefly to sell more online ads. To do that, it has created services that expand beyond indexing and search into managing and adding value to content.

Where is added value most needed? It's too hard for information consumers to get the answers they want. Search is still dumb, and despite reams of research, dreams of a sophisticated Semantic Web — with a common syntax that will enable software-agent bots to communicate, book flights and otherwise do your bidding without human assistance — remain unrealized. Google, del.icio.us, flikr and others firms are responding to opportunities created by shortcomings of the first-generation Web. Google Analytics aims to complement and extend its money-making AdWords and AdSense, which match ads to searches and content. Google Base is less an innovative means of publishing and more a way to ensure personally published content will be found by searchers. They're part of a belated effort by a host of software and service providers to enhance usability and underpin a still-chaotic labyrinth with machine-processable meaning.

Individuals are willing participants in this effort. I'm fascinated by collaboratively authored content: by mash-ups that display geolocated user data on maps, by tagging that attaches keywords to everything from blogs to photos to user pages on social-networking sites, by Wikis and, especially, Wikipedias that collect knowledge by consensus rather than by fiat. These collaborations create interconnectedness that goes beyond what's possible with hyperlinks and relevance greater than you'll find in algorithmically ranked lists of search results. They present a sense of the Web as a whole greater than the sum of a few million servers.

These collaborations fill the gaps left by conventional authoring tools and search. Text mining is supposed to bridge that content-meaning gap, and the articles I've written on the topic prove I'm a big fan. Your choice of search engine will help you find those articles, but only if you search on "text mining" and my name, pick through the hits returned and give the promising-looking articles a quick read. Sorry, software that can grok value-laden concepts such as being "a big fan" — software that identifies and extracts and weighs opinions and offers up highlights, TiVo style — isn't ready for prime time.

Forty years after Joseph Weizenbaum demonstrated natural-language conversation with the ELIZA computer program, and half a century since Alan Turing posed his famous test of artificial intelligence, figures I've seen suggest that a well-tuned text-mining system will give you 85- to 90-percent accuracy — B+ marks, and that at high cost. The theory is that a combination of linguistic and statistical analysis and machine learning will go where no machine has gone before. Yet Turing's statement, "We can only see a short distance ahead, but we can see plenty there that needs to be done," remains true.

Collaboratively authored, networked and manually tagged content are a user-driven response to search shortcomings, and they conveniently provide enterprises grist for the information mill. "Total information awareness" was a Defense Department dream that's now an enterprise imperative. Enterprises most need and can best afford part-way-there solutions like monitoring news and user-generated content and then using text mining to extract sentiment. It behooves organizations to pursue these solutions because network effects mean that news and opinions travel farther and faster than ever before (following Metcalfe's Law that the value of a network increases as the square of the number of connected nodes). Quick response in the name of reputation management is mandatory.

Web creator Tim Berners-Lee saw that the second-generation Web would be bound by semantic interoperability. Poor usability and findability have fed the demand for machine-exploitable meaning. That meaning is being created from the bottom-up, by text-mining and content hosting and by end-user collaborations such as mash-ups and Wikis, tagging and linking: by analytics and by intention.

Seth Grimes is a principal of Alta Plana Corp., a Washington, D.C.-based consultancy specializing in large-scale analytic computing systems. Write to him at [email protected].

Read more about:

20052005

About the Author

Seth Grimes

Contributor

Seth Grimes is an analytics strategy consultant with Alta Plana and organizes the Sentiment Analysis Symposium. Follow him on Twitter at @sethgrimes

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights