EVENTS in Lutsen

Is there a way to implement JOINS in Lucene?

+6
join lucene
source share
8 answers

You can perform a general connection manually - run two queries, get all the results (instead of the top N), sort them by the connection key and cross two ordered lists. But it will be shaking violently on your heap ( if lists even fit into it).

Optimization is possible, but under very specific conditions.
That is, you are doing self-training and using only (random access) Filters to filter, no Queries . Then you can manually iterate over the terms in your two connection fields (in parallel), cross the docId lists for each term, filter them - and here is your connection.

There is an approach that uses the popular use case of simple parent-child relationships with a relatively small number of children for each document - https://issues.apache.org/jira/browse/LUCENE-2454
Unlike the flattening method mentioned in @ntziolis, this approach handles cases like: there are several resumes, each of which has several work_experience children, and try to find someone who worked for NNN in the year YYY. If you just flatten, you will get a resume for people who worked at NNN any year and worked somewhere in the YYY year.

An alternative to handling the simple cases of the parent child is to smooth your document, but to ensure that the values ​​for different children are separated by a large posIncrement gap, then use the SpanNear query to prevent multiple subqueries from matching between the children. This was a several year LinkedIn presentation. but I could not find her.

+8
source share

You can also use the new BlockJoinQuery; I described this on my blog here:

http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

+12
source share
+3
source share
+3
source share

Lucene does not support relationships between documents , but the union is nothing more than a concrete combination of several AND in parentheses, but you need to smooth the relationships first .

Example (SQL => Lucene):

SQL:

 SELECT Order.* FROM Order JOIN Customer ON Order.CustomerID = Customer.ID WHERE Customer.Name = 'SomeName' AND Order.Nr = 400 

Lucene:
Make sure that you have all the required fields and their corresponding values ​​in the document: Customer.Name => "Customer_Name" and
Order.Nr => "Order_Nr"

Then the query will look like this:

 ( Customer_Name:"SomeName" AND Order_Nr:"400" ) 
+2
source share

On top of Lucene, there are some implementations that make similar joins possible between several different indexes. Numere ( http://numere.stela.org.br/ ) allows this and allows you to get results as a result set of RDBMS.

+1
source share

Here's an example Numere provides an easy way to extract analytic data from Lucene indexes

 select a.type, sum(a.value) as "sales", b.category, count(distinct b.product_id) as "total" from a (index) inner join b (index) on (a.seq_id = b.seq_id) group by a.type, b.category order by a.type asc, b.category asc Join join = RequestFactory.newJoin(); // inner join a.seq_id = b.seq_id join.on("seq_id", Type.INTEGER).equal("seq_id", Type.INTEGER); // left { Request left = join.left(); left.repository(UtilTest.getPath("indexes/md/master")); left.addColumn("type").textType().asc(); left.addMeasure("value").alias("sales").intType().sum(); } // right { Request right = join.right(); right.repository(UtilTest.getPath("indexes/md/detail")); right.addColumn("category").textType().asc(); right.addMeasure("product_id").intType().alias("total").count_distinct(); } Processor processor = ProcessorFactory.newProcessor(); try { ResultPacket result = processor.execute(join); System.out.println(result); } finally { processor.close(); } 

Result:

 <?xml version='1.0' encoding='UTF-8' standalone='yes' ?> <DATAPACKET Version="2.0"> <METADATA> <FIELDS> <FIELD attrname="type" fieldtype="string" WIDTH="20" /> <FIELD attrname="category" fieldtype="string" WIDTH="20" /> <FIELD attrname="sales" fieldtype="i8" /> <FIELD attrname="total" fieldtype="i4" /> </FIELDS> <PARAMS /> </METADATA> <ROWDATA> <ROW type="Book" category="stand" sales="127003304" total="2" /> <ROW type="Computer" category="eletronic" sales="44765715835" total="896" /> <ROW type="Meat" category="food" sales="3193526428" total="110" /> 

... proceed

0
source share

A bit late, but you can use the org.apache.lucene.search.join package: https://lucene.apache.org/core/6_3_0/join/org/apache/lucene/search/join/package-summary.html

From their documentation:

Support for index-time binding is combined in a search where documents are indexed as a single block of a document using IndexWriter.addDocuments ().

  String fromField = "from"; // Name of the from field boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index String toField = "to"; // Name of the to field ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join. Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode); TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher // Render topDocs... 
0
source share

All Articles