I work with SOLR in a project where we import a bunch (~ 40 thousand items) from rich documents, mainly from MS Word, Powerpoint, Excel and PDF.
Are there any best practices schema.xmland / or solrconfig.xmlfor use in SOLR when used ExtractingRequestHandler?
I am doing the default scheme tricks to try to get the faces working on the date change time, but even without this, I suppose there may well be a good example of how these files should be when the default output from Tika is enough.
If there is no such thing as best practice schema.xmland / or solrconfig.xml, I am also interested in good examples, preferably from existing open source projects or even good blog posts.
Any pointers are welcome!
source
share