Is it possible to search for specific areas using elastics search?

I need to search for text in documents based on the following areas:

  • Whole document
  • Chapters
  • Items
  • suggestions

Is it possible to index a document so that I can filter the query area based on this requirement?

Edit due to answers

Now I created the following index

{ "settings": { "analysis": { "analyzer": { "folding": { "tokenizer": "standard", "filter": [ "lowercase", "asciifolding" ] } } } }, "mappings": { "books": { "properties": { "content": { "type": "string", "fields": { "english": { "type": "string", "analyzer": "english" }, "folded": { "type": "string", "analyzer": "folding" } } }, "author": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "language": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "source": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "title": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "fileType": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } }, "sections": { "_parent": { "type": "books" }, "properties": { "content": { "type": "string", "fields": { "english": { "type": "string", "analyzer": "english" }, "folded": { "type": "string", "analyzer": "folding" } } }, "paragraphs": { "type": "nested", "properties": { "paragraph": { "properties": { "page": { "type": "integer" }, "number": { "type": "integer" }, "html_tag": { "type": "string" }, "content": { "type": "string" } } } } }, "author": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "language": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "source": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "title": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "fileType": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } }, "messages": { "properties": { "content": { "type": "string", "fields": { "english": { "type": "string", "analyzer": "english" }, "folded": { "type": "string", "analyzer": "folding" } } }, "paragraphs": { "type": "nested", "properties": { "paragraph": { "properties": { "page": { "type": "integer" }, "number": { "type": "integer" }, "html_tag": { "type": "string" }, "content": { "type": "string" } } } } }, "author": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "language": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "source": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "title": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "fileType": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } } } 

This gives me the following types: books, sections (parent books) and messages. Sections and posts have a nested type Paragraphs, and I skipped the sentence level.

Now I can search the contents at the book level, the contents at the section level. This allows me to search for words between paragraphs. I can also search directly at the paragraph level, which is useful if I want to combine two words in a paragraph.

Example: let's say I have the following document

 paragraph 1: It is a beautiful warm day. paragraph 2: The cloud is clear. 

Now I can search for beautiful And clouds at the content level and return a document. However, I do not return the document if I look for a beautiful cloud AND at the paragraph level using a nested search, and this is what I wanted.

I see the width of this solution:

  • I need to index the same paragraph 3 times. Once at the paragraph level, once at the content section level and once at the book content level.
  • I do not understand what benefit I get from the relationship between parents and children between books and sections. I did not find a search method at the same time using highlighting.
  • I need a separate message type that is exactly the same as a section type without a parent. Is there no way to have a type of children without parents so that I can avoid the extra type?
+6
source share
2 answers

To do this, you can index all the sentences and, together with the words of the sentence, include information about the covering context, that is, in which paragraph, chapter and book this sentence is indicated.

Then a query of terms will return you sentences and with them information about the chapter and the book. With this information, you know which sentence, paragraph, chapter, or book matters.

Then you just use the area that interests you.

Example document for indexing:

 { "book": <book-id>, "chapter": <chapter-id>, "paragraph": <paragraph-id>, "sentence": <sentence-id>, "sentence_text": "Here comes the text from a sentence in the indexed book" } 

Additional answer after asking a question

For this, you can use different types of documents stored in the same index. Then you can use a single query that will return documents of possibly different types (paragraphs, books, etc.). Then, filtering the type of result, you get what you want. Here is an example:

Whole book:

 POST /books/book/1 { "text": "It is a beautiful warm day. The cloud is clear." } 

1st paragraph:

 POST /books/para/1 { "text": "It is a beautiful warm day." } 

2nd paragraph:

 POST /books/para/2 { "text": "The cloud is clear." } 

Request for documents:

 POST /books/_search { "query": { "match": { "text": { "query": "beautiful cloud", "operator": "and" } } } } 

Does this solve your problem?

+4
source

Another option is to have one document / book, but there are many attached documents inside, so they can all use the same “book context at the root level. It is up to you if you have one hierarchy level ( all sentences as attached documents) or more (capter => paragrap => sentence). One level will contain simpler queries.

 { "book": 123, "author": "Harry", "written": 1995, "sentences": [ { "chapter": 1, "paragraph": 2, "sentence": 3, "text": "abc def" }, { "chapter": 2, "paragraph": 3, "sentence": 4, "text": "ghi jkl" }, { ... } ] } 
+1
source

All Articles