Is there a better schema.xml way for SOLR when importing rich documents?

I work with SOLR in a project where we import a bunch (~ 40 thousand items) from rich documents, mainly from MS Word, Powerpoint, Excel and PDF.

Are there any best practices schema.xmland / or solrconfig.xmlfor use in SOLR when used ExtractingRequestHandler?

I am doing the default scheme tricks to try to get the faces working on the date change time, but even without this, I suppose there may well be a good example of how these files should be when the default output from Tika is enough.

If there is no such thing as best practice schema.xmland / or solrconfig.xml, I am also interested in good examples, preferably from existing open source projects or even good blog posts.

Any pointers are welcome!

+5
source share
1 answer

In Taming Text books (http://www.manning.com/ingersoll/) you have a link to ExtractingRequestHandler. This book is about text processing using open source tools such as solr, tika or lucene.

5, , solr, schema.xml differents .

0

All Articles