Is there a better schema.xml way for SOLR when importing rich documents?

Question

Is there a better schema.xml way for SOLR when importing rich documents?

I work with SOLR in a project where we import a bunch (~ 40 thousand items) from rich documents, mainly from MS Word, Powerpoint, Excel and PDF.

Are there any best practices schema.xmland / or solrconfig.xmlfor use in SOLR when used ExtractingRequestHandler?

I am doing the default scheme tricks to try to get the faces working on the date change time, but even without this, I suppose there may well be a good example of how these files should be when the default output from Tika is enough.

If there is no such thing as best practice schema.xmland / or solrconfig.xml, I am also interested in good examples, preferably from existing open source projects or even good blog posts.

Any pointers are welcome!

+5

-text-full search lucene solr the apache-tika solr-cell

Pål brattberg Dec 05 '11 at 23:31

source share

1 answer

josegil · Answer 1 · 2011-12-09T14:04:25+0000

In Taming Text books (http://www.manning.com/ingersoll/) you have a link to ExtractingRequestHandler. This book is about text processing using open source tools such as solr, tika or lucene.

5, , solr, schema.xml differents .

Is there a better schema.xml way for SOLR when importing rich documents?

More articles: