Can the terms “frequencies / terminal vectors” be directly changed?

I would like to use Lucene.NET to store and query timeline vectors. However, I do not want the term vectors to be created from documents. Instead, I want to be able to write and update the term vectors directly, without the positions or offsets of the term / token.

A workaround would be to generate text from a term vector, i.e. from the term vector

foo: 3; bar: 1

generate text

foo, foo, foo, bar

and let Lucene index this text. If I want to update the term bar frequency to 2, I could get the saved text (or create it from the vector of the old word, if I do not save it), change it to

foo, foo, foo, bar, bar

and update the corresponding document in the index.

It is quite expensive for such a simple task. Obviously, this is not a use case; Lucene was built for use. However, I would like to be able to use Lucene credentials for requests, etc.

Is there a way to write timeline vectors for a document directly or do you have other good ideas?

+4
source share
1 answer

As I said in my question, Lucene is not intended to directly store and manipulate terminal vectors. The initial approach more or less relates at least to the process of updating the term vector:

  • Get a document that represents the corresponding term vector
  • Update corresponding document field
  • (Delete, then Add Update Lucene)

.

, , -:

foo foo foo bar

Foo: 3; : 1;

TokenFilter, , n . , . , , , , .

+2

All Articles