PDF index documents in Solr from C # client

Question

PDF index documents in Solr from C # client

I basically try to index words or PDF documents in Solr and found ExtractingRequestHandler, but I can’t figure out how to write code in C # that executes an HTTP POST request, like in Solr quiz: http://wiki.apache.org/solr/ ExtractingRequestHandler .

I installed Solr 3.4 on Tomcat 7 (7.0.22) using the files from the example / solr directory in Zip Solr, and I did not change anything. ExtractingRequestHandler needs to be configured out of the box in the solrconfig.xml file and ready to use, right?

Can one of you give a C # (HttpWebRequest) example of how you make an HTTP POST request and upload a PDF file, how is this done using curl in the Solr wiki?

I look through this entire site, and many others try to find an example or tutorial on how to do this, but have not found anything.

EDIT:

I finally managed to get it working with SolrNet!

For it to work, you need to copy this to the lib folder in the Solr installation directory from the Solr zip file:

The apache-solr-cell-3.4.0.jar file from the dist folder
Contents of the directory contrib \ extract \ lib

With SolrNet 0.4.0 beta 2, this code does the job:

Startup.Init<IndexDocument>("YOUR-SOLR-SERVICE-PATH"); var solr = ServiceLocator.Current.GetInstance<ISolrOperations<IndexDocument>>(); using (FileStream fileStream = File.OpenRead("FILE-PATH-FOR-THE-FILE-TO-BE-INDEXED")) { var response = solr.Extract( new ExtractParameters(fileStream, "doc1") { ExtractFormat = ExtractFormat.Text, ExtractOnly = false }); } solr.Commit();

Excuse for troubling. However, I hope others find this helpful.

+8

c # tomcat pdf solr solrnet

jonasm Jan 19 '12 at 23:47

source share

1 answer

Paige cook · Accepted Answer · 2012-01-20T00:19:50+0000

I would recommend using the SolrNet client. It supports ExtractingRequestHandler.

PDF index documents in Solr from C # client

More articles: