I need to create a full-text search form for a database of emails / support tickets (in C #), and I'm looking for tips and articles on how to approach this. In particular, I would like to know how to approach the classic problems of full-text search, for example:
- Make sure that matches are reasonable, for example, if someone enters the "big head" and the document contains a "big hairy head", making sure that the document is returned in the search.
- Order relevance results.
- How to bet to display matches, for example, highlighting the relevant conditions
I know that full-text search is a pretty mammoth subject area in itself, I'm just looking for simple articles and tips on how to create something that is at least useful and useful.
I used to use things like Lucene.Net - obviously, some kind of full-text index is required - a complex bit takes a list of documents that Lucen returns and presents it in a useful way.
UPDATE: I want to clarify a bit what I mean. There are hundreds of common full-text search forms that perform a very similar function, for example:
- Search button on every online forum
- Search button on each wiki
- Search in Windows / Google Desktop
- Google
Each of these search queries receives information from different sources and displays them using different means (html, Windows form, etc.), but each of them solves the same problems in various complex methods and for the most part (with possible exception for desktop searches) input data has the same format: HTML or text.
I am looking for tips and general strategies on how to do things like rank search results in ways that can be useful to the user.
As an alternative, one of the strategies I was considering was doing something like using some kind of wiki software, exporting the entire dataset to text on this wiki and just using the wiki to search - the kind of search I followed , for all purposes and goals, functionally identical to 99% of search queries that already exist, I just want to give it a different source data source and format the output in a slightly different way (both of which I already know how to do).
Of course, there should be some advice on how these searches are performed?
source share