It could be faster. You can use Regex Groups as follows:
public List<string> Keyword_Search(HtmlNode nSearch) { var wordFound = new List<string>(); // cache inner HTML string innerHtml = nSearch.InnerHtml; string pattern = "(\\b" + string.Join("\\b)|(\\b", _keywordList) + "\\b)"; Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase); MatchCollection myMatches = myRegex.Matches(innerHtml); foreach (Match myMatch in myMatches) { // Group 0 represents the entire match so we skip that one for (int i = 1; i < myMatch.Groups.Count; i++) { if (myMatch.Groups[i].Success) wordFound.Add(_keywordList[i-1]); } } return wordFound; }
This way you use only one regular expression. And group indices should correlate with your _keywordList with an offset of 1, hence the string wordFound.Add(_keywordList[i-1]);
UPDATE:
After I looked at my code again, I realized that putting matches into groups really wasnโt necessary. And Regex groups have some overhead. Instead, you can remove the bracket from the template, and then simply add the matches themselves to the wordFound list. This will give the same effect, but will be faster.
It will be something like this:
public List<string> Keyword_Search(HtmlNode nSearch) { var wordFound = new List<string>(); // cache inner HTML string innerHtml = nSearch.InnerHtml; string pattern = "\\b(?:" + string.Join("|", _keywordList) + ")\\b"; Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase); MatchCollection myMatches = myRegex.Matches(innerHtml); foreach (Match myMatch in myMatches) { wordFound.Add(myMatch.Value); } return wordFound; }
Steve wortham
source share