How do I know if any names appear in a list of names in a ColdFusion paragraph?

Suppose I have a list of employee names from a database (in the near future, thousands, perhaps tens of thousands). To simplify the task, suppose that each firstname / lastname combination is unique (large if, but tangent).

I also have a news feed that relates to business (again, maybe in hundreds of items per day).

What I would like to do is to determine whether the name of the employee appears in the news item of several paragraphs, and if so, then "tag" the item with the person he is talking about.

There can be more than one employee in one news item, so breaking the cycle after the first positive match is not possible.

I can, of course, use brute force: for each news item, iterate over each employee’s name and, if the regex expression returns a match, pay attention to it.

Is there an easier way in ColdFusion, or should I just do my nested loops?

+4
source share
3 answers

Just throw it away as something you can do ...

It seems like you almost unanimously have significantly more employee names than words per message. Here is how I can handle this:

You always have a CF application that will pull channels and onAppStart

  • Capture all employees from your db
  • Create a search structure with an extension in the application with names in the form of keys and a structure of names as values ​​(if you wish, you can also add native names with names of the 3rd level).

Thus, one of the key features may be Vanessa with a structure with two keys (Johnson and Fort) as its value.

Then, each article you parse is simply a listToArray with a space as a separator and a loop through the array, making simple structKeyExists with each token. For matches, check the next element in the array as a last name.

I would suggest that it will be much more efficient to process than to perform many search queries, as well as almost no time for coding, and you can simply load any future sources (your checker takes one argument, any text on Earth).

Interested in what route you are planning and whether your experiments show anything new about performance in CF.

+7
source

Matthew, you have a high order there, and there really are many parts to the challenge / solution. But only from the point of view of comparing a list of values ​​with a given set of text, to see if there is one of them, you will find that some CF function cannot function. Because of this, I created a new one, findList, available in cflib:

http://cflib.org/index.cfm?event=page.udfbyid&udfid=1908

It is not perfect and not optimal, as it may be, but it can be a useful first step either by you or by giving you some ideas. However, this meets my needs (determine if this blog comment has a link to any of the blacklisted words). I show this by comparing a list of URLs, but it can be any words in general. Hope this helps.

0
source

Another option worth exploring is to use the Solr engine, which now comes with CF. It will do a heavy row lift for you, and you can probably focus on dynamically updating your collections and optimizing as new feed items appear.

Good luck

0
source

All Articles