ElasticSearch: More Indexes Against Other Types

We use elasticsearch for next use. Elasticsearch Version: 5.1.1
Note: We use managed AWS ElasticSearch

We have a multi-storey system in which each tenant stores data for several things and the number of tenants every day.

exa: each tenant will have the following information.

1] tickets 2] sw_inventory 3] hw_inventory 

Current stratergy indexing is as follows:

index_name:
tenant_id (GUID) exa: tenant_xx1234xx-5b6x-4982-889a-667a758499c8

types:

 1] tickets 2] sw_inventory 3] hw_inventory 

The problems we are facing:

1] Conflicts for mapping common exa fields: (id, name, userId) in types (tickets, sw_inventory, hw_inventory)
2] As the number of tenants increases, the number of indices can reach up to 1000 or 2000 as well.

Would it be a good idea if we cancel stratergy indexing?

exa: index names:

 1] tickets 2] sw_inventory 3] hw_inventory 

types:

 tenant_tenant_id1 tenant_tenant_id2 tenant_tenant_id3 tenant_tenant_id4 

Thus, there will be only 3 huge indexes with N number of types as tenants.

So the question in this case is which solution is better?

1] Many small indexes and 3 types
OR
2] 3 huge indexes and many types

Hi

+8
elasticsearch
source share
4 answers

None of the approaches will work. As others noted, both approaches justify cost-effectiveness and do not allow you to upgrade.

Consider one index and type for each data set, for example. sw_inventory , and then having a field within the display that distinguishes between each tenant. You can then use document-level security in a security plugin such as X-Pack or Search Guard so that one of the tenants does not see the other entries (if required).

+4
source share

I suggest a different approach: https://www.elastic.co/guide/en/elasticsearch/guide/master/faking-it.html

A custom routing value where each document has tenant_id or similar (something unique to each tenant) and uses this both for routing and to define an alias for each tenant. Then, when requesting documents for a specific tenant only, you use an alias.

This way you will use one index and one type. Depending on the size of the index, you take into account the existing size of the index and the number of nodes and try to create several fragments so that they are distributed evenly more or less on all data storage nodes, and also after your tests are acceptable. IF, in the future, the index becomes too large, and the fragments become too large to maintain the same performance, consider creating a new index with more primary fragments and reindexing everything in this new one. This is not an approach that has not been heard or used or has not been used.

1000-2000 aliases have nothing to do with processing capabilities. If you have about 10 nodes or more than 10, I also recommend dedicated master nodes with a heap size of 4-6 GB and cores of at least 4CPU.

+4
source share

Indexes created in Elasticsearch 6.0.0 or later can contain only one type of mapping, which means that doc_type (_type) is deprecated.

You can find the full explanation here , but in the end there are two solutions:

Document Type Pointer

This approach has two advantages:

  • Data is more likely to be dense, so use the compression methods used by Lucene.
  • The term statistics used for counting in full-text search is likely to be accurate, because all documents in one index are a single object.

Custom type field

Of course, there is a limit on how many primary fragments can exist in a cluster, so you may not want to spend the entire fragment to collect just a few thousand documents. In this case, you can implement your own type field of your own, which will work similarly to the old _type.

 PUT twitter { "mappings": { "_doc": { "properties": { "type": { "type": "keyword" }, "name": { "type": "text" }, "user_name": { "type": "keyword" }, "email": { "type": "keyword" }, "content": { "type": "text" }, "tweeted_at": { "type": "date" } } } } } 

You are using an older version of Elastic, but the same logic can be applied, and it would be easier for you to switch to a newer version when you decide to do this, so I think you should go with a separate index structure or in other words 3 huge indexes and many types, but type as a field in a mapping is not like _type.

+1
source share

I think both strategies have pros and cons:

Multiple Indexes:

Pros : - The tenant data is isolated from others, and the query does not return results from more than a few. - If the total number of documents is a very large number, various smaller indexes can give better performance.

Ends : harder to manage. If each index has several documents, you can spend a lot of resources.

EDITED: Avoid multiple types in the same index as in comments o Performance and function obsolescence

-one
source share

All Articles