I need to search for text in documents based on the following areas:
- Whole document
- Chapters
- Items
- suggestions
Is it possible to index a document so that I can filter the query area based on this requirement?
Edit due to answers
Now I created the following index
{ "settings": { "analysis": { "analyzer": { "folding": { "tokenizer": "standard", "filter": [ "lowercase", "asciifolding" ] } } } }, "mappings": { "books": { "properties": { "content": { "type": "string", "fields": { "english": { "type": "string", "analyzer": "english" }, "folded": { "type": "string", "analyzer": "folding" } } }, "author": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "language": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "source": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "title": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "fileType": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } }, "sections": { "_parent": { "type": "books" }, "properties": { "content": { "type": "string", "fields": { "english": { "type": "string", "analyzer": "english" }, "folded": { "type": "string", "analyzer": "folding" } } }, "paragraphs": { "type": "nested", "properties": { "paragraph": { "properties": { "page": { "type": "integer" }, "number": { "type": "integer" }, "html_tag": { "type": "string" }, "content": { "type": "string" } } } } }, "author": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "language": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "source": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "title": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "fileType": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } }, "messages": { "properties": { "content": { "type": "string", "fields": { "english": { "type": "string", "analyzer": "english" }, "folded": { "type": "string", "analyzer": "folding" } } }, "paragraphs": { "type": "nested", "properties": { "paragraph": { "properties": { "page": { "type": "integer" }, "number": { "type": "integer" }, "html_tag": { "type": "string" }, "content": { "type": "string" } } } } }, "author": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "language": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "source": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "title": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "fileType": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } } }
This gives me the following types: books, sections (parent books) and messages. Sections and posts have a nested type Paragraphs, and I skipped the sentence level.
Now I can search the contents at the book level, the contents at the section level. This allows me to search for words between paragraphs. I can also search directly at the paragraph level, which is useful if I want to combine two words in a paragraph.
Example: let's say I have the following document
paragraph 1: It is a beautiful warm day. paragraph 2: The cloud is clear.
Now I can search for beautiful And clouds at the content level and return a document. However, I do not return the document if I look for a beautiful cloud AND at the paragraph level using a nested search, and this is what I wanted.
I see the width of this solution:
- I need to index the same paragraph 3 times. Once at the paragraph level, once at the content section level and once at the book content level.
- I do not understand what benefit I get from the relationship between parents and children between books and sections. I did not find a search method at the same time using highlighting.
- I need a separate message type that is exactly the same as a section type without a parent. Is there no way to have a type of children without parents so that I can avoid the extra type?