Lucene cannot find documents after update

Question

Lucene cannot find documents after update

It seems that whenever I update an existing document in the index (the same behavior for delete / add), it cannot be found with TermQuery. Here is a short snippet:

iw = new IndexWriter (directory, config);

Document doc = new Document(); doc.add(new StringField("string", "a", Store.YES)); doc.add(new IntField("int", 1, Store.YES)); iw.addDocument(doc); Query query = new TermQuery(new Term("string","a")); Document[] hits = search(query); doc = hits[0]; print(doc); doc.removeField("int"); doc.add(new IntField("int", 2, Store.YES)); iw.updateDocument(new Term("string","a"), doc); hits = search(query); System.out.println(hits.length); System.out.println("_________________"); for(Document hit : search(new MatchAllDocsQuery())){ print(hit); }

The result is the following console output:

 stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<string:a> stored<int:1> ________________ 0 _________________ stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<string:a> stored<int:2> ________________

It seems that after updating the document (but rather a new document) is indexed and returned in MatchAllDocsQuery, but TermQuery cannot be found.

A complete sample code is available at http://pastebin.com/sP2Vav9v

Furthermore, this only happens (the second search does not work) when the StringField value contains special characters (for example, file: / F: /).

+7

java lucene

Michael Sep 17 '14 at 14:00

source share

2 answers

I could get rid of this by re-creating my working directory after all indexing operations: create a new directory only for these indexing operations named "path_dir", for example. If you have updated, perform the following operations and repeat all previous work.

 StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_46); FSDirectory dir; try { // delete indexing files : dir = FSDirectory.open(new File(path_dir)); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_46, analyzer); IndexWriter writer = new IndexWriter(dir, config); writer.deleteAll(); writer.close(); } catch (IOException e) { e.printStackTrace(); }

However, note that this method will be very slow if you are processing large data.

+1

balik Sep 27 '14 at 23:16

source share

mindas · Accepted Answer · 2014-09-29T09:13:06+0000

The code you specified in pastebin does not find anything, because your StringField is nothing more than a stopwatch ( a ). Replacing a with something that is not a stopwatch (for example, ax ) causes both requests to return 1 document.

You would also achieve the correct result if you built StandardAnalyzer with an empty set of stop words ( CharArraySet.EMPTY_SET ), but still used a for StringField . This does not work for file:/F:/ , though.

However, the best solution would be this case: replace StandardAnalyzer with KeywordAnalyzer .

Lucene cannot find documents after update

More articles: