Expression of expressions using word boundary to match alphanumeric and non-alphanumeric characters in javascript

Question

Expression of expressions using word boundary to match alphanumeric and non-alphanumeric characters in javascript

I am trying to highlight a set of keywords using JavaScript and a regular expression, I ran into one problem, my keyword may contain literals and special characters, like in @text #number, etc. I use the word boundary to match and replace the whole word and not a partial word (contained in another word).

var pattern = new regex('\b '( + keyword +')\b',gi);

Here, this expression matches all keywords and selects them, however, in the event that any keyword, for example, "number:", is not highlighted.

I know that \bword\b matches the word boundary, and special characters are not alphanumeric characters, so they do not match the above expression. Can you tell me which regular expression I can use to accomplish the above.

== Update ==

For the above, I tried Tim Pitzker's suggestion for lower regular expression,

 expr: (?:^|\\b|\\s)(" + keyword + ")(?:$|\\b|\\s)

The above seems to work to get a match for the whole word with alphanumeric and non-alphanumeric characters, however whenever a keyword has a sequential html tag before or after a keyword without a space, it does not highlight that keyword (e.g. , social security * number: <br> *) I tried the following regular expression, but it replaces the html tag preceding the keyword

 expr: (?:^|\b|\s|<[^>]+>)number:(?:$|\b|\s|<[^>]+>)

Here, for the keyword, the number:, which has < br > (the space specially designed for the br label to avoid the browser interpreting the tag), the next one without a space between them, is highlighted by the keyword.

Can you suggest an expression that ignores the sequential html tag for the entire word with alphanumeric and non-alphanumeric characters.

+4

javascript regex alphanumeric

Bhupen Nov 18 '10 at 13:54

source share

6 answers

Tim pietzcker · Answer 1 · 2010-11-18T12:00:14+0000

OK, so you have two problems: JavaScript does not support lookbehind, and \b detects the boundaries between alphanumeric and non-alphanumeric characters.

First question: what exactly is the word boundary for your keywords? I assume this should be either a \b border or a space. If so, you can search

 "(?:^|\\b|\\s)(" + keyword + ")(?:$|\\b|\\s)"

Of course, space characters around keywords, such as @number# , will also become part of the match, but maybe highlighting them wrong. In other cases i. e. if there is a real word boundary that may coincide, spaces will not be part of the match, so in most cases they should work fine.

The actual word that interests you will be in backreference # 1, so if you can highlight it separately, even better.

EDIT: If after / before the keyword other characters can appear, except for a space, then I think the only thing you can do (if you are stuck in JavaScript):

Make sure your keyword starts with the symbol alnum.
If so, add \b to your regular expression.
Check if your alnum keyword ends.
If so, add \b to your regular expression.

So for keyword use \bkeyword\b ; for number: use \bnumber: for @twitter use @twitter\b .

PleaseStand · Answer 2 · 2010-11-18T11:58:40+0000

We need to find a substring in which there is a space character on both sides . If JavaScript supports lookbehind, it will look like this:

 var re = new RegExp('(?<!\\S)' + keyword + '(?!\\S)', 'gi');

This will not work (but will be in Perl and other scripting languages). Instead, we need to include the leading space character (or the beginning of the line) as the initial part of the match (and, possibly, capture what we are really looking for in $ 1):

 var re = new RegExp('(?:^|\\s)(' + keyword + ')(?!\\S)', 'gi');

Just think that the real place where any match starts will be one character after , which is returned by the .index property returned by re.exec(string) , and that if you access the matched string, you either need to delete the first character with .slice(1) , or just access what was captured.

fcalderan · Answer 3 · 2010-11-18T11:33:58+0000

perhaps what you are trying to do is

 '\b\W*(' + keyword + ')\W*\b'

Nathan macinnes · Answer 4 · 2010-11-18T11:35:04+0000

Lookahead and lookbehind is your answer: "(?=<[\s^])" + keyword + "(?=[\s$])" . Bits in parentheses are not included in the match, so include any characters that are not allowed in keywords there.

tchrist · Answer 5 · 2010-11-18T13:58:15+0000

As Tim correctly points out, \b are complex things that work differently than people often think they work. Read this answer for more details on this and what you can do about it.

In short, this is the border on the left:

 (?(?=\w)(?<!\w)|(?<!\W))

and this is the border on the right:

 (?(?<=\w)(?!\w)|(?!\W))

People always think that there are gaps, but there arent. However, now that you know these definitions, it is easy to build in them. You can replace \w and \w with echange for \s and \s in the two patterns above. Or you can add awareness of else blocks to spaces.

sumair · Answer 6 · 2011-09-09T20:31:46+0000

Try this, it should work ...

 var pattern = new regex(@"\b"+Regex.escape(keyword) +@ "\b",gi);

Expression of expressions using word boundary to match alphanumeric and non-alphanumeric characters in javascript

More articles: