Why Enterprise Search Sucks
Ron Miller of EContent wrote a very good article AIIM Study Finds Enterprise Search Still Lacking about an upcoming AIIM report on Findability and disappointed expectations for enterprise search. Ron's title is more polite than some of the words I've heard (and used) to characterize enterprise search. Bluntly - if we all agree that enterprise search sucks, what is to be done?
Ron quotes Dan Keldsen, director of market intelligence at AIIM:
"It’s not that people don’t have search or other tools and techniques to find information. They have too many tools. They have search in their email client, search on the web, the sales force automation software has its own search [and so forth]." The trouble is most organizations don’t have tools to search across everything, he explains. In spite of the fact that federated search has been around for some time, he says, most organizations don’t have it because it’s tricky and expensive to implement.
Therefore it’s not surprising that 82 percent of those surveyed by AIIM agreed or strongly agreed that their experience with the consumer web has "created increased demand for enterprise findability." Whether that’s realistic or not, matters little, says Keldsen, because we have to face the fact that these users are frustrated for whatever reason. "Should we be frustrated that this is what people think and feel, or face it because it’s reality?" he asks.
In a highly unscientific poll of about forty people attending the Network Application Consortium's Fall 2007 Conference on Collaboration Technologies, I asked:
"How many of you think enterprise search sucks?"
and was not surprised to see about forty hands raised. I also believe that expectations set by people easily finding what they want on the public Web sets a high bar for what they expect at work (see Why Can't A Business Work More Like the Web?).
Ron's story goes on to quote Carl Frappaolo, VP at AIIM: "I don’t think the technology is failing us, I think it’s the way we are using the technologies," but he adds, "If I can’t find my content, it doesn’t exist."
I have a slightly different take. If all relevant content isn't indexed, it can't be found, but when you add more content stores to be indexed, the signal to noise ratio can get worse as coverage increases.
The technology of enterprise search is robust and capable of astonishingly deep analysis of great piles of content in almost any format.
But the relevance of search results often gets worse as a larger number of stovepiped and minimally cross-linked content stores are indexed. Email stores are often the worst offenders - but contain much of the most valuable working communication.
On the public Web, page rank and similar algorithms cleverly leverage human intelligence to help determine what people have found relevant in the past and found "link worthy". Web page content can provide valuable and indexable context for other files and pages connected by links.
In the enterprise, there are very few links to use for relevance ranking, and tons of duplicate files (or minor variations of the same file) attached to email that's blasted throughout the company and scattered .
Think of poor Dagwood Bumstead working hard to win the Acme Products account. He drafts a PowerPoint and circulates it for review. Because it's an important account many people are cc'd. They each squirrel away a copy, make proposed changes and sent those modified copies around.
The poor enterprise search engine may have hundreds or thousands of copies of duplicate or near duplicate PowerPoint files that talk about the Acme Product proposal - but very little context to determine which version is most relevant, or the context in which it was created. The signal to noise ratio of broadly cc'd email discussions with rats nests of quoted content is even worse.
I believe that blog, wiki, RSS feeds and tagging metadata (collectively "E2.
For example, the relevance rank of blog posts or wiki pages talking about the Acme Account can contribute to the relevance of any directly or indirectly referenced PowerPoint describing Dagwood Bumstead's plan. The PowerPoint could be stored within an E2.
To me, the most important point is that the E2.
- General business context: Inferred by correlating content analysis, use of "sales" related content tags, and other contextual clues
- Specific business context: The Acme proposal, with resources collected, used, discussed, or referenced to create that proposal.
- Time line: Items referenced or discussed while developing and discussing the Acme proposal.
- People involved: Who worked on the Acme proposal ? What did they talk about and tag ?
- Space: In what public, private, personal or by invitation collaboration space was the content recorded or referenced ?
When Mr. Dithers shouts: "Bumstead! Where are we on the Acme Account?", the most timely, frequently discussed and contextually relevant version of Dagwood's slide set could pop closer to the top of the result list, along with the cloud of tags and people who have touched or talked about that result.
For more thoughts on how the content of E2.
See also Information Foraging at FASTForward '07
Authority versus Page Rank
A first-order approximation of what I'm talking about:
TeamPage | Attivio Search Module
but the concept using E2.