You are hereSearch Engine
Search Engine
- Open source search engines have been in the past a poor parent of the open source community. But for the last year, especially thanks to the dynamism of the Apache Lucene project, many industrial solutions are emerging. On top of the traditional multi formats indexers, there are now solutions for advanced features like clustering.
Nutch
Nutch is open source web-search software. It builds on Lucene, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
ht://dig
The ht://Dig system is a complete open source search engine for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista. Instead it is meant to cover the search needs for a single company, campus, or even a particular sub section of a web site.
As opposed to some WAIS-based or web-server based search engines, ht://Dig can easily span several web servers. The type of these different web servers doesn't matter as long as they understand common protocols like HTTP.
Egothor
Egothor is an Open Source, high-performance, full-featured text search engine written entirely in Java. It is technology suitable for nearly any application that requires full-text search, especially cross-platform. It can be configured as a standalone engine, metasearcher, peer-to-peer HUB, and, moreover, it can be used as a library for an application that needs full-text search.
Apache Solr
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface.
Carrot2
Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize (cluster) search results into thematic categories.
Carrot2 provides an architecture for acquiring search results from various sources (YahooAPI, GoogleAPI, MSN Search API, OpenSearch, Lucene index), clustering the results and visualising the clusters. Currently, 5 clustering algorithms are available that are suitable for different kinds of document clustering tasks.
Compass
Compass is a first class open source Java Search Engine Framework, enabling the power of Search Engine semantics to your application stack decoratively. Built on top of the amazing Lucene Search Engine, Compass integrates seamlessly to popular development frameworks like Hibernate and Spring. It provides search capability to your application data model and synchronizes changes with the datasource. With Compass: write less code, find data quicker.
Xapian
Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
OpenPipeline
OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing.