How to save normalized models when searching through ElasticSearch?

When setting up MySQL / ElasticSearch compilation is better:

  • Fully synchronize all model information with ES (even data without searching), so when the result is found, I have all its information.

  • Only synchronize the searchable fields, and then when I return the results, use the id field to search for the actual data in the MySQL database?

+5
source share
3 answers

ElasticSearch is a search engine. I would advise you not to use it as a database system. I suggest you index only search data and a unique identifier from your database so that you can retrieve results from MySQL using the unique key returned by ElasticSearch. Thus, you will use both applications for your purposes. An elastic search is not the best for relationship queries, and you will have to write a lot more code to work with the relevant data than just using MySql for it.

In addition, you do not want to link your persistence level using the search layer. They should be as independent as possible, and a change in one should not affect the other as much as possible. Otherwise, you will have to update both systems if they need to change. MySQL query on some identifiers is very fast, so you can use it and leave the slow part (full-text query) to search for elasticity.

+2
source

The Elasticsearch data model usually prefers unnormalized data. Depending on the use case (large amount of data, low-power machines, too few nodes, etc.) Saving relations in ES (parent-child) to simulate internal connections, etc. From the world, RDB is expensive.

Your question is very open, and the answer depends on the use case. Generally speaking:

  • avoid imitating exact database tables - ES indices and their relationships
  • The advantage of saving everyone in ES is that you do not need to update both mechanisms at the same time.
  • if your search data is very small compared to the total amount of data, I don’t understand why you couldn’t synchronize only the data available for search with ES
  • try to smooth the data in ES and resist any impulse to use the parent / child object just because it is done in MySQL
  • I am not saying that you cannot use parent / child. You can, but make sure you check this out before taking this approach and make sure you are ok with the response time. This is, in any case, valid advice for any approach you choose.
+7
source

Although it depends on the situation, I suggest you go with No. 2:

  • Faster at indexing: we extract only searchable data from the database and index in ES, compare with fetch all and index all
  • Smaller storage: since indexed data is smaller than # 1, it’s easier to back up, restore, restore, update your ES in production. It will also save the size of your storage as data grows, and you can also use SSDs to improve performance at a lower cost.
  • In general, the search application will search in some fields and show the user all possible data. For example, searching for products, but will show information on prices / stocks .. on the results page, available only in the database. Thus, nature has a second step for requesting additional information in the database and combining it with the search results for display.

Hope this helps.

+1
source

All Articles