I need to index a patent directory that has the following data structure:
"cpc": [ { "class": "61", "section": "A", "sequence": "1", "subclass": "K", "subgroup": "06", "main-group": "45", "classification-value": "I" }, { "class": "61", "section": "A", "sequence": "2", "subclass": "K", "subgroup": "506", "main-group": "31", "classification-value": "I" } ]
I was wondering what the right approach is here. I could use cpc.class and combine it with multiValued = "true".
I would like to find documents matching a specific CPC code. The CPC code may be partial. Right now, my solution is simply using a nested link with multiValued = true. Is there a better way to do this?
<field name="cpc.class" type="int" indexed="true" stored="true" multiValued="true" /> <field name="cpc.section" type="string" indexed="true" stored="true" multiValued="true" /> <field name="cpc.sequence" type="int" indexed="true" stored="true" multiValued="true" /> <field name="cpc.subclass" type="string" indexed="true" stored="true" multiValued="true" /> <field name="cpc.subgroup" type="int" indexed="true" stored="true" multiValued="true" /> <field name="cpc.main-group" type="int" indexed="true" stored="true" multiValued="true" /> <field name="cpc.classification-value" type="string" indexed="true" stored="true" multiValued="true" />
The problem with this implementation is that it returns documents that do not meet the search criteria. Example:
"cpc.section:A", "cpc.class:61", "cpc.subclass:Q", "cpc.main-group:8"
I get documents that do not have this combination. I think that the current method implements a search, so that each field is a list and comparable values ββin any combination. I need to narrow it down so that only the right combinations return.
json indexing solr
Isstvan
source share