Nothing really exciting in the Enterprise Search space when it comes to Open Source. If Lucene/SOLR is the dominant player, it is just ok for web search. It can’t compete with the big proprietary software like Vivisimo, FAST, Exalead or Autonomy platforms.
Leading Open Source Enterprise Search solutions
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface.
Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize (cluster) search results into thematic categories.
Carrot2 provides an architecture for acquiring search results from various sources (YahooAPI, GoogleAPI, MSN Search API, OpenSearch, Lucene index), clustering the results and visualising the clusters. Currently, 5 clustering algorithms are available that are suitable for different kinds of document clustering tasks.
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
ElasticSearch is a distributed, RESTful, free/open source search server based on Apache Lucene. It is developed by Shay Banon and is released under the terms of the Apache License.
Other Open Source Enterprise Search solutions
Nutch is open source web-search software. It builds on Lucene, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
At the heart of Autonomy’s infrastructure software lies the Intelligent Data Operating Layer (IDOL) Server. The IDOL Server collects indexed data from connectors and stores it in its proprietary structure, optimized for fast processing and retrieval of data. As the information processing layer, IDOL forms a conceptual and contextual understanding of all content in an enterprise, automatically analyzing any piece of information from over 1,000 different content formats and even people’s interests. Over 500 operations can be performed on digital content by IDOL, including hyperlinking, agents, summarization, taxonomy generation, clustering, eduction, profiling, alerting and retrieval.
FAST ESP is the enterprise search platform product from FAST, now a subsidiary of Microsoft. Due to its experience and track record, FAST ESP can tackle the demanding search and information access challenges.
Endeca Information Access Platform is one of the leading enterprise search solution. With a strong bias on data analytics, it can be see as a new way to address business intelligence.
Dieselpoint Search™ is search and navigation software for enterprise data including document collections, databases, and XML.
SearchBlox is a content search software designed for vertical search, intranets, websites, portals and custom-applications. It is based on Lucene and Carrot2.
Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use.
OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.
A new kid on the block when it comes to open source enterprise search. The company must still provide some evidence of a real open source strategy: frequent release of open source code & strong community would be appreciated.
A comprehensive, tested version of Apache Solr based on the most recent stable Solr version with additional utilities, bug fixes and performance enhancements.
A comprehensive, tested version of Apache Lucene based on the most recent stable Lucene version with additional bug fixes, productivity and index tools.
Flax is based on the Xapian search engine library, Python scripting language and CherryPy web application framework.
GATE is made up of three elements:
– An architecture describing how language processing systems are made up of components.
– A framework (or class library, or SDK), written in Java and tested on Linux, Windoze and Solaris.
– A graphical development environment built on the framework.
Indri, Lemur’s latest search engine that is also available on its own when all you need is a search engine. Indri has an index capable of indexing very large collections and a structured query language that supports fields and passages.
Nothing really exciting in the Enterprise Search space when it comes to Open Source. If some recent press releases like the one from Dieselpoint has created some excittation, no open source solutions can today compete with the Vivisimo, FAST, Exalead or Autonomy platforms.