Regular expressions are beautiful, as long as keywords are really words, you can simply use the RegExp constructor instead of a literal to create one of the variable:
var re= new RegExp('('+word+')', 'gi'); return s.replace(re, '<b>$1</b>');
Difficulty arises if “keywords can have punctuation, since punctuation tends to have special meaning in regular expressions. Unfortunately, unlike most other languages / libraries with regexp support, there is no standard function for removing punctuation for regular expressions in JavaScript.
And you cannot be completely sure which characters need escaping, because not every regexp implementation in the browser is guaranteed to be exactly the same. (In particular, new browsers may add new functionality.) And backslash characters, which are not special, do not guarantee operation, although in practice this happens.
So the best thing you can do is one of:
- attempt to catch every special character in common browser usage today [add: see Sebastian’s recipe]
- backslash - avoid all non-literal characters. care: \ W will also match Unicode characters other than ASCII, which you really don't want.
- just make sure there are no non-alphanumeric characters in your keyword before searching
If you use this to highlight words in HTML that already have markup, you have problems. Your word may appear in the element name or attribute value, in which case an attempt to wrap <b> around it will cause a split. In more complex scenarios, it is even possible to inject HTML into the XSS security hole. If you need to deal with markup, you will need a more complex approach, separating '<...>, before trying to process each piece of text yourself.
bobince Nov 11 '08 at 13:15 2008-11-11 13:15
source share