Find and replace a few words without affecting future replacements

What I want to do is some "forbidden words" highlighting.

Here are the values ​​that I have:

I have a list of forbidden words in an array

{ "word1", "word2", "word3", "word4" } 

I have a line representing a comment

 "i want to word1ban this word3 stupidword4 comment" 

I want to highlight them in html bold tags ( <b> </b> ). So, for example, this comment line will look like this:

 "i want to <b>word1</b>ban this <b>word3</b> stupid<b>word4</b> comment" 

As I actually do this, a regular expression replacement is used, and it works very well, except for one thing that annoys me.

 foreach (var word in words) { value = Regex.Replace(value, string.Format(@"{0}", Regex.Escape(HttpUtility.HtmlEncode(word))), "<b>" + word + "</b>", RegexOptions.IgnoreCase); } 

The problem with this, and also depends on the word order in the array, is that one of the forbidden words will affect your replacement ( <b> or </b> )

For example, if you add this to forbidden words: <b

Following the code, the first result of the iteration will be:

 "i want to <b>word1</b>ban this <b>word3</b> stupid<b>word4</b> comment" 

Then replace <b following:

 "i want to <b><b</b>>word1</b>ban this <b><b</b>>word3</b> stupid<b><b</b>>word4</b> comment" 

I do not want to influence my replacement. I am wondering how we can do this. I tried adding exceptions to my regex to exclude <b> and </b> in replacement without success.

+4
source share
2 answers

Ignoring the entire β€œHTML” aspect of the problem and simply approaching it at an angle

I want to find and replace a few words, but I do not need a replacement that I made to affect future replacements

You can do one thing: immediately make all the replacements!

 var pattern = "(" + String.Join("|", words.Select(w => Regex.Escape(w))) + ")"; // eg (word1|word2|word3|word4) value = Regex.Replace( value, pattern, "<b>$1</b>", RegexOptions.IgnoreCase); 
+2
source

In general, you need to replace some of the terms at the input, but not at the output that has been released so far. This is not too difficult to do manually, but first you will need to determine which term gets priority, which needs to be replaced.

Let's say that you have a dictionary of terms and replacements, and the strategy for choosing which term to replace is: "replace the one closest to the beginning of the entry, if many members appear in the same position, replace the long one." Here is one way to do this:

 string ReplaceWithoutOverlap(string input, IDictionary<string, string> replacements) { var processedCharCount = 0; var sb = new StringBuilder(); while (processedCharCount < input.Length) { var replacement = replacements .Select(r => Tuple.Create(r.Key, input.IndexOf(r.Key, processedCharCount))) .Where(t => t.Item2 != -1) .OrderBy(t => t.Item2) .ThenByDescending(t => t.Item1.Length) .FirstOrDefault(); if (replacement == null) { break; } sb.Append(input, processedCharCount, replacement.Item2 - processedCharCount); sb.Append(replacements[replacement.Item1]); processedCharCount = replacement.Item2 + replacement.Item1.Length; } sb.Append(input.Substring(processedCharCount)); return sb.ToString(); } 

Look at the action .

Of course, this is not quite what you want to do here (in fact, replacing everything at once with one regular expression is probably the most convenient), but you can see how this will work.

0
source

All Articles