Articles and guidelines for implementing a full-text search form

I need to create a full-text search form for a database of emails / support tickets (in C #), and I'm looking for tips and articles on how to approach this. In particular, I would like to know how to approach the classic problems of full-text search, for example:

  • Make sure that matches are reasonable, for example, if someone enters the "big head" and the document contains a "big hairy head", making sure that the document is returned in the search.
  • Order relevance results.
  • How to bet to display matches, for example, highlighting the relevant conditions

I know that full-text search is a pretty mammoth subject area in itself, I'm just looking for simple articles and tips on how to create something that is at least useful and useful.

I used to use things like Lucene.Net - obviously, some kind of full-text index is required - a complex bit takes a list of documents that Lucen returns and presents it in a useful way.

UPDATE: I want to clarify a bit what I mean. There are hundreds of common full-text search forms that perform a very similar function, for example:

  • Search button on every online forum
  • Search button on each wiki
  • Search in Windows / Google Desktop
  • Google

Each of these search queries receives information from different sources and displays them using different means (html, Windows form, etc.), but each of them solves the same problems in various complex methods and for the most part (with possible exception for desktop searches) input data has the same format: HTML or text.

I am looking for tips and general strategies on how to do things like rank search results in ways that can be useful to the user.

As an alternative, one of the strategies I was considering was doing something like using some kind of wiki software, exporting the entire dataset to text on this wiki and just using the wiki to search - the kind of search I followed , for all purposes and goals, functionally identical to 99% of search queries that already exist, I just want to give it a different source data source and format the output in a slightly different way (both of which I already know how to do).

Of course, there should be some advice on how these searches are performed?

+4
source share
5 answers

You can use the large library from apache Lucene.Net also Linq to Lucene extensions can simplify your work.

+2
source

SQL Server (including Express versions) has a full-text search . It can search text inside columns, but can also use IFilters to search inside embedded documents. You can use the FREETEXTTABLE command in T-SQL to intelligently search inside content and return it in ranking order:

"Returns a table of zero, one or more rows for columns containing character-based data types for values ​​that match the value, but not the exact wording of the text in the specified freetext_string. FREETEXTTABLE can only be a reference in a FROM SELECT clause as a regular table name.

Queries using FREETEXTTABLE tell freetext full-text queries to return a relevancy ranking value (RANK) and full-text key (KEY) for each row. "

eg.

SELECT FT_TBL.CategoryName ,FT_TBL.Description ,KEY_TBL.RANK FROM dbo.Categories AS FT_TBL INNER JOIN FREETEXTTABLE(dbo.Categories, Description, 'sweetest candy bread and dry meat') AS KEY_TBL ON FT_TBL.CategoryID = KEY_TBL.[KEY]; 

For more information, read Understanding SQL Server Full-Text Indexing .

+2
source

Your topic is a database related question. you need to determine which database you will use. You can specify the search keyword in the database, and not search in your program.

0
source

Take a look at CONTAINSTABLE as well, as it supports wildcards and scales, etc.

http://msdn.microsoft.com/en-us/library/ms189760.aspx

0
source

If you do not want to use the SQL root, then also consider Microsoft Search Server 2008 Express - it is free, powerful, and easy to use. It meets all your requirements and automatically processes things like ranking, etc.

0
source

All Articles