MarkLogic cts: false positives of a request element?

Question

MarkLogic cts: false positives of a request element?

Subject to this document: -

<items> <item><type>T1</type><value>V1</value></item> <item><type>T2</type><value>V2</value></item> </items>

unsurprisingly, I found that this would return the page in cts:uris() : -

 cts:and-query(( cts:element-query(xs:QName('item'), cts:element-value-query(xs:QName('type'),'T1') ), cts:element-query(xs:QName('item'), cts:element-value-query(xs:QName('value'),'V2') ) ))

but somewhat surprisingly (at least for me), I also believe that this will also be: -

 cts:element-query(xs:QName('item'), cts:and-query(( cts:element-value-query(xs:QName('type'),'T1'), cts:element-value-query(xs:QName('value'),'V2') )) )

This does not seem to be correct, since there is no single element with type = T1 and value = V2 . This seems false to me.

I misunderstood how cts:element-query works? (I have to say that the documentation is not particularly clear in this area).

Or is it something where MarkLogic aims to give me the result that I expect, and if I had more or better indexes, I would be less likely to get a false positive match.

+6

xquery false-positive marklogic

Andy key May 23 '16 at 18:35

source share

2 answers

Yes, I think this is a slight misunderstanding of how queries work. In cts:search , the filtered option is used by default. In this case, ML will evaluate the request using only indexes, and then, as soon as the candidate’s documents are selected, he will load them into memory, check and filter false positives. This is more time consuming, but more accurate.

cts:uris is a lexicon function, therefore, requests passed to it will be resolved only using indexes, and there is no way to filter out false positives.

An easy way to handle this request with indexes is to modify your schema, so that documents are based on <item> instead of <items> . Then each item will have a separate index entry, and the results will not be combined before filtering.

Another way that does not require updating documents is to wrap the queries you expect in a single element in cts:near-query . This will prevent the <type> in one <item> from matching with <value> in another <item> . I suggest reading the documentation, because you may need to include one or more indexes for the cts:near-query position, to be exact.

+4

wst May 23 '16 at 19:08

source share

grtjn · Accepted Answer · 2016-05-24T09:23:14+0000

In addition to @wst's answer, you need to include element value positions in order to get accurate results from unfiltered searches. Here are some examples to show this:

 xdmp:document-insert("/items.xml", <items> <item><type>T1</type><value>V1</value></item> <item><type>T2</type><value>V2</value></item> </items>); cts:search(collection(), cts:element-query(xs:QName('item'), cts:and-query(( cts:element-value-query(xs:QName('type'),'T1'), cts:element-value-query(xs:QName('value'),'V2') )) ), 'unfiltered' )

Without element value positions enabled, this returns a test document. After including positions, the query returns nothing.

As @wst says, cts:search() triggered by default, while cts:uris() (and, for example, xdmp:estimate() only works without filtering.

NTN!

MarkLogic cts: false positives of a request element?

More articles: