I am writing an internal application containing several pieces of text information, as well as a series of pieces of data about these pieces of text. These pieces of data will be stored in the database (SQL Server, although this may change) in the input order.
I would like to be able to search for the most relevant of these pieces of information, the most relevant of which should be at the top. I originally studied using SQL Server full-text search, but not as flexible for my other needs as I had hoped, so it seems to me that I need to develop my own solution for this.
From what I understand, I need an inverted index , and then for the contents of the specified inverted index, which needs to be restored and changed based on the results of the additional information (although at the moment this can be left at a later date, since I just want inverted index indexed body text from database table / rows).
I had a problem writing this code in Java using a Hashtable with a key as words and a value as a list of occurrences of a word, but to be honest, I'm still pretty new to C # and have only really used things like DataSet and DataTables when processing information. If necessary, I will download Java code soon, as soon as I cleaned this laptop from viruses.
If a set of records is specified from a table or from a list of rows, how can I create an inverted index in C #, which will preferably be stored in a DataSet / DataTable?
EDIT: I forgot to mention that I have already tried Lucene and Nutch, but I require that my own solution, as a modification of Lucene to meet my needs, take much longer than writing an inverted index. I will process a lot of metadata that also needs to be processed after the basic inverted index is complete, so all I need now is a basic full-text search in one area using the inverted index. Finally, working on an inverted index is not something that I get every day, so it would be great if it had a crack.
Mike b
source share