Open Source Enterprise Search Software

Nothing really exciting in the Enterprise Search space when it comes to Open Source. If some recent press releases like the one from Dieselpoint has created some excittation, no open source solutions can today compete with the Vivisimo, FAST, Exalead or Autonomy platforms.

Leading Open Source Enterprise Search solutions

Other Open Source Enterprise Search solutions

Nutch is open source web-search software. It builds on Lucene, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use.

OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.

A new kid on the block when it comes to open source enterprise search. The company must still provide some evidence of a real open source strategy: frequent release of open source code & strong community would be appreciated.

A comprehensive, tested version of Apache Solr based on the most recent stable Solr version with additional utilities, bug fixes and performance enhancements.

A comprehensive, tested version of Apache Lucene based on the most recent stable Lucene version with additional bug fixes, productivity and index tools.

Flax is based on the Xapian search engine library, Python scripting language and CherryPy web application framework.

GATE is made up of three elements:

- An architecture describing how language processing systems are made up of components.
- A framework (or class library, or SDK), written in Java and tested on Linux, Windoze and Solaris.
- A graphical development environment built on the framework.