I apologize in advance if this question has already been answered - I could not find it.
I am relatively new to Solr and follow the tutorial instructions to use the standard SimplePostTool to index my command line data. I am currently using Solr 4.0 in my testing.
First, I delete everything in my index on request. Then I point SimplePostTool to several directories and index tens of thousands of files. In my case, now, each XML file is a separate document. Some documents may have the same unique Key. If that matters, XML document sizes range from 460KB.
SimplePostTool returns when it finishes, and says that 26,541 files are indexed. Then I look at the Admin collection1 page and see Num Docs = 20.985 and Max Doc = 22.921.
I saw other posts discussing the mismatch between Num Docs and Max Doc (I feel like I understand that rewriting behavior is enough). My question is why the number of indexed documents submitted by SimplePostTool does not match the Max Doc set on the Solr administration page?
source share