Why should I “store”: “yes” in the search for elastics?

I really don’t understand why in the link of basic types it says in attribute descriptions (for example, for a number):

  • store - set yes to save the actual field in the index, and not to store it. By default, no (note that the JSON document itself is stored and can be extracted from it )
  • index - Set the value to no if the value should not be indexed. In this case, the repository should be set to yes, because if it is not indexed and not saved, it has nothing to do with it

Two bold parts seem to contradict each other. If "index":"no", "store":"no" , I could get the value from the source. This may be useful if I have a field containing a url, for example. No?

I had a little experiment where I had two displays, in one field was set to "store":"yes" , and in the other to "store":"no" .

In both cases, I can indicate in my request:

 {"query":{"match_all":{}}, "fields":["my_test_field"]} 

and I got the same answer, returning the field.

I thought that if "store" set to "no" , it would mean that I could not restore a specific field, but I had to get the whole _source and _source it on the client side.

So what is the use of setting "store" to "yes" ? Is this only relevant if I exclude the field from the "_source" field explicitly?

+57
elasticsearch
Jun 14 '13 at 7:13
source share
2 answers

I thought that if "store" is set to "no", it would mean that I could not get a specific field, but I needed to get the whole source and analyze it on the client side.

Exactly what elasticsearch does for you when the field is not saved (by default) and the _source field _source on (by default too).

Usually you send a field to elasticsearch because you either want to find it or get it. But it’s true that if you do not store the field explicitly and you do not disconnect the source, you can still get the field using _source . This means that in some cases it may make sense to have a field that is not indexed or stored.

When you save a field that runs in base lucene. Lucene is an inverted index that allows quick full-text search and returns document identifiers specified by text queries. In addition to the inverted index, Lucene has some kind of storage where the field values ​​can be stored so that they can be obtained taking into account the identifier of the document. You usually store in lucene the fields that you want to return as search results. Elasticsearch does not require storing every field that you want to return, since it always saves by default every document that you send to it, so it can always return everything that you sent to it as a search result.

In several cases, it may be useful to store the fields explicitly in lucene: when the _source field _source disabled or when we want to avoid parsing it, even if parsing is done automatically using elasticsearch. Keep in mind that extracting many saved fields from lucene may require one disk search per field, and if you only extract _source from lucene and parse it to get the required fields, this is just one disk search and only faster in most cases.

+101
Jun 14 '13 at 14:31
source share

The default is _source (indexed document). This means that when you search, you can get the real source of the document back. Moreover, asticsearch will automatically extract fields/objects from _source and return them if you explicitly ask for it (and also possibly use it in other components, such as highlighting).

You can indicate that a specific field is also saved. This means that the data for this field will be stored independently . This means that if you request field1 (which is stored), asticsearch will determine that it is stored and load it from the index, not from _source (assuming _source is enabled).

When do you want to enable storage of certain fields? In most cases, no. Source extraction is fast, and extraction is also fast. If you have very large documents where the cost of storing _source or the cost of parsing _source is high, you can explicitly display some storage fields instead.

Note that there is a cost to retrieve each saved field. So, for example, if you have json with 10 fields with a reasonable size, and you display all of them as saved, and request all of them, this means loading each of them (more drives are looking for), compared to just loading _source ( this is one field, possibly compressed).

I got this answer from the link below to which shay.banon responded. You can read this entire thread to get a good idea about it . enter link description here

+3
Aug 05 '16 at 10:57
source share



All Articles