ETL. Extract, convert, load. In other words, extract data from existing databases, convert them (which is more than just denormalizing them) and load them into SOLR. A SOLR drive will be much smaller than existing databases because there is no relational overhead. And SOLR search takes up most of your existing database servers.
Take a look at how to configure and use SOLR and learn about SOLR kernels. You may want to put some languages ββin separate kernels, because in this way you can use different generation algorithms in SOLR more efficiently. But even with multilingual data, you can still use bigrams (for example, for analysis in Chinese).
Having multiple cores makes searching more difficult, because you can try either the index of one language or the index of all languages. But it is much more efficient to group language data and apply language specific stop words, protected words, foundations and tools for analyzing the language.
Usually you include some key data in the index so that when you find an entry through the SOLR search, you can directly reference the db source. In addition, you can have normalized and non-normalized data together, for example, an enumeration can be written in a normalized field in English, as well as an unnormalized field in the same language as the free text. A field can be duplicated to apply two different analysis and filtering methods.
It would be helpful to check this with a subset of your data to find out how SOLR works and how best to configure it.
source share