Is full-text search suitable for finding names of people?

I have a user table with one column for a person's name:

CREATE TABLE [dbo].[Users] ( Id bigint NOT NULL, Name nvarchar(80) NOT NULL, PRIMARY KEY CLUSTERED (Id ASC) ) 

The Name column can contain either a full name, or just a name or something really (separated by spaces). To implement a search in Name , I would like to use full-text SQL search, but I'm not sure if it is suitable for searching for names / aliases, not actual words. The question also arises: what language do I choose when creating the FT index on Name ?

Any other considerations?

Thanks.

+7
source share
2 answers

It seems that if you want to search for the names of several parts, full-text search is the easiest and most suitable approach (please correct me if I am wrong). Another alternative is LIKE '%query%' , however it has too many disadvantages:

  • Awful performance as it indexes a scan
  • The order of terms matters, for example. - A search for "John Smith" and "Smith John" will return different results.
  • It ignores word boundaries, for example. - Anne searches will also retrieve Joanna and Danny, which are not useful matches.

So, I went ahead and implemented a full-text search. My queries look something like this:

 SELECT * FROM Users WHERE CONTAINS(Name, '"John*"') 

The only easy difficulty was that I had to convert the user request (John) to the CONTAINS request ("John *"). To do this, I applied this method in my UserRepository:

 /// <summary> /// Converts user-entered search query into a query that can be consumed by CONTAINS keyword of SQL Server. /// </summary> /// <example>If query is "John S Ju", the result will be "\"John*\" AND \"S*\" AND \"Ju*\"".</example> /// <param name="query">Query entered by user.</param> /// <returns>String instance.</returns> public static string GetContainsQuery(string query) { string containsQuery = string.Empty; var terms = query.Split(new[] { ' ' }, StringSplitOptions.None); if (terms.Length > 1) { for (int i = 0; i < terms.Length; i++) { string term = terms[i].Trim(); // Add wildcard term, eg - "term*". The reason to add wildcard is because we want // to allow search by partially entered name parts (partially entered first name and/or // partially entered last name, etc). containsQuery += "\"" + term + "*\""; // If it not the last term. if (i < terms.Length - 1) { // We want all terms inside user query to match. containsQuery += " AND "; } } containsQuery = containsQuery.Trim(); } else { containsQuery = "\"" + query + "*\""; } return containsQuery; } 

Hope this helps someone stumble into the same issue.

PS - I wrote a blogpost documenting this.

+1
source

At first glance, I would recommend using the LIKE operator rather than a full-text query.

Make sure that you are not case sensitive and may be case insensitive. This can be achieved by setting the correct sorting on the server, in the database, in a table column or in a query. In the request, this is done as follows:

 SELECT * FROM [dbo].[Users] WHERE Name LIKE '%niaher%' COLLATE SQL_Latin1_General_CP1_CI_AI 

If you use the full-text index, you get all sorts of functions, such as creating verbs and thesaurus, see Linguistic components and language support in full-text Search , which you do not need, when searching in the list of names. By the way, these functions are language dependent and therefore you specify the language in a full-text index.

Use a stop list that you might even want to avoid. At least I would like, since in Dutch many surnames begin with articles and / or prepositions: "Rembrandt van Rijn". "van" will most likely be in the Dutch stop list and will prevent any match in the search term containing "van".

If you run into performance issues, it might be helpful to try the full-text index and search using CONTAINS with a simple term .

 SELECT * FROM [dbo].[Users] WHERE CONTAINS(Name, 'niaher') 

Please note that full-text indexes are updated asynchronously.

+2
source

All Articles