I use Lucene.net to index content and documents, etc. on websites. The index is very simple and has this format:
LuceneId - unique id for Lucene (TypeId + ItemId)
TypeId - the type of text (eg. Page content, product, public doc etc ..)
ItemId - the web page id, document id etc ..
Text - the text indexed
Title - web page title, document name etc .. to display with the search results
I have these options to adapt it to serve multilingual content:
- Create a separate index for each language. For instance. Lucene-enGB, Lucene-frFR, etc.
- Save one pointer and add an additional "language" field to it to filter the results.
Which option is better - or is there another? I have not used multiple indexes before, so I'm leaning towards the second.
source share