bringing reusability to enterprise search

www.collabor.com Share on Twitter

[email protected]

Bringing Reusability to Enterprise Search

Using Solr for building reusable enterprise search engine.

A Collabor Labs Technologypaper, May 2011

This whitepaper discusses the high level technical aspects of using Solr to bring reusability in enterprise search implementation

Brahmaji Pusuluri

Sr. Software Engineer

https://twitter.com/?status=RT%20%40Collabor%20Check%20out%20the%20new%20Whitepaper%20%3A%20Bringing%20Reusability%20to%20Enterprise%20Search%20http%3A%2F%2Fbit.ly%2Freusability-search

Research by: Collabor Labs Share on Twitter May 2011 All trademarks belong to their respective owners

Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is suitable for nearly any application that requires full-text search.

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. HTTP request processing for indexing and querying documents. Thus, you can have an application anywhere query and index files over the Internet via XML over HTTP using the URL of your Solr search server. It is also a highly optimized search server with caching and replication to other Solr search servers. It has the powerful feature of indexing Rich text documents (e.g.: word, pdf, etc.)

Once Solr is installed successfully, we need to modify the following files as per the project requirements.

Solrconfig.xml: solrconfig.xml is the file that contains most of the parameters for configuring Solr itself.

Schema.xml: The schema.xml file contains all of the details about which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields.

Once the settings are done you can send an xml file to the Solr to index the data by using curl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xml example.xml file containing the tags format which is defined in schema.xml.



Most applications store data in relational databases or XML files and searching over such data is a common use-case. The DataImportHandler is a Solr contrib that provides a configuration driven way to import this data into Solr in both "full builds" and using incremental delta imports.

Edit your solrconfig.xml to add the request handler

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler>

The data-config.xml file contains the following.

<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/> <document> <entity name="name" query="select id,name,desc from mytable"> <field column="id" name="solr_id"/> <field column="name" name="solr_name"/> <field column="desc" name="solr_desc"/> <entity name="inner" query="select details from another_table where id ='${outer.id}'"> <field column="details" name="solr_details"/> </entity> </entity> </document> </dataConfig>

Run the full-import command to index the entire database. http://localhost:8983/solr/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/dataimport?command=delta-import

Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own configuration and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by spinning up new

http://localhost:8983/solr/dataimport?command=full-import

http://localhost:8983/solr/dataimport?command=delta-import



SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container. Edit the solr.xml and write a snippet. See example below.

<solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores"> <core name="application1" instanceDir="app1"> <property name="dataDir" value="/app1/data" /> <property name="configName" value="/app1/config.xml" /> <property name="schemaName" value="/app1/schema.xml" /> </core> <core name="application2" instanceDir="app2" /> </cores> </solr>

Run the full-import command to index the entire database in application1. http://localhost:8983/solr/application1/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/application1/dataimport?command=delta-import Run the full-import command to index the entire database in application2. http://localhost:8983/solr/application2/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/application2/dataimport?command=delta-import

Searching for indexes

http://localhost:8983/solr/application1/select/?q=searchterm returns xml file with results.

We can reuse single Solr installation to multiple enterprise search implementations. References:

1. http://lucene.apache.org/solr/ 2. Wikipedia pages – Apache Solr

For more information, contact: [email protected]

http://localhost:8983/solr/application1/dataimport?command=full-import

http://localhost:8983/solr/application1/dataimport?command=delta-import

http://localhost:8983/solr/application2/dataimport?command=full-import

http://localhost:8983/solr/application2/dataimport?command=delta-import

http://localhost:8983/solr/application1/select/?q=searchterm

http://lucene.apache.org/solr/

mailto:[email protected]