If it were me, if there is no normative reason why you cannot, I would drop them all by one index. It’s just my “don’t optimize what you don’t need,” talking about it.
The first concern is simply legal: you MUST JOIN and mix data together, even if they are separated by logical means. This applies to your attorneys, clients, and service agreements. This is not a problem.
Assuming that you can, then the next question is what impact other users will have on each other. If user A uses the system and user B is in the process of importing his 100K documents, will this affect user A? This affects user A because of how Lucene works, or simply because of the overall system load that occurs when importing and indexing documents.
Try and see.
The main thing is to make sure that your client systems do not directly access Lucene, but rather through some kind of facade. This facade is an ideal place to ensure customer segregation, as well as a good place to redirect traffic if at some point you decide that you need to outline your indexes.
You might need to snatch one heavy user out. Or are you selling a higher level of response time to someone who is guaranteed more resources in their SLA, etc.
But deciding what is the best way now? Eh, it seems early.
500K documents are not much data for Lucene. Just make sure you have the flexibility to implement it so you can add it later if you find that placing all of this in one instance is not viable. And "add ability" I mean exactly that, add it. Actually DO NOT IMPLEMENT, say, client-based shards. But rather, this is a good point at which it MAY be implemented without re-adding a bunch of plumbing later.
source share