Configure DataImportHandler in SolrCloud with ZooKeeper

I have SolrCloud configured as follows: SolrCloud research , the difference is that I'm using Solr 4.0.0 Beta. Soon configuration:

  • ZooKeeper default 2181
  • 3 Solr instances running on different ports

This is just for testing purposes. The desired configuration consists of 3 ZooKeeper instances (one for each Solr instance). I am able to index some XML files using the curl command.

Questions:

  • How to set up a DIH / collection? I managed to modify the solrconfig.xml file (config for dataimport-handler), add the correct driver for connecting to DB to lib, but in solr admin I get "sorry, no dataimport-handler!" Changes can be observed in zookeeper (I see data_config.xml), and in the solr control panel I can see an updated version of the solrconfig.xml file.

  • Is there any good tutorial for deploying solrcloud production (with the same needed configuration mentioned earlier) on one or more machines for Ubuntu 12.04 LTS?

Any advice would be appreciated! Thanks in advance!

+7
source share
1 answer

Typically, a DIH configuration has nothing to do with using a single instance of Solr or multiple instances in a solrCloud configuration. DIH will write data to the current instance of the Lucene index, and then to zooKeeper to speed it up in other instances.

Make sure your DIH is configured correctly:

In the solrconfig.xml file, all the necessary libraries are loaded. This means that two cans of DIH:

 <lib dir="../../../dist/" regex="solr-dataimporthandler-4.3.0.jar" /> <lib dir="../../../dist/" regex="solr-dataimporthandler-extras-4.3.0.jar" /> 

as well as other banks that you may need (for example, a JDBC database driver, etc.).

In the solrconfig.xml file, verify that the DIH handler is declared, for example:

 <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler> 

Finally, the configuration file that you specified in the DIH handler (data-config.xml) must be in the same conf directory as in the solrconfig.xml file and must have the appropriate content, for example:

 <dataConfig> <dataSource type="JdbcDataSource" name="myDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@someHost:1521:someDb" user="someUser" password="somePassword" batchSize="5000"/> <document name="myDoc" > <entity name="myDoc" dataSource="myDatasource" transformer="my.custom.Transformer" query="select col1, col2, col3 from table1 where whatever" /> </document> </dataConfig> 
+4
source

All Articles