Matching word in String in Java

I am trying to match strings containing the word "#SP" (without quotes, case insensitive) in Java. However, I find that using Regexes is very difficult!

The lines I need to match are: "This is a sample #sp string" "#SP string text..." "String text #Sp"

Lines I don’t want to match: "Anything with #Spider" "#Spin #Spoon #SPORK"

Here is what I have so far: http://ideone.com/B7hHkR . Can someone guide me through creating my regex?

I also tried: "\\w*\\s*#sp\\w*\\s*" no avail.

Edit: here is the code from IDEone:

 java.util.regex.Pattern p = java.util.regex.Pattern.compile("\\b#SP\\b", java.util.regex.Pattern.CASE_INSENSITIVE); java.util.regex.Matcher m = p.matcher("s #SP s"); if (m.find()) { System.out.println("Match!"); } 
+4
source share
3 answers

You are fine, but \ b before # is misleading. \ b is the word boundary, but # is no longer a word symbol (that is, it is not in the set [0-9A-Za-z_]). Therefore, the space before the # symbol is not considered the boundary of the word. Change to:

 java.util.regex.Pattern p = java.util.regex.Pattern.compile("(^|\\s)#SP\\b", java.util.regex.Pattern.CASE_INSENSITIVE); 

A value (^ | \ s) means: either ^ OR \ s, where ^ means the beginning of your line (for example, "#SP String"), and \ s means a space character.

+4
source

(change: a positive lookbehind is not needed, only a match is performed, not a replacement)

You are another victim of unnamed Java regular expression matching methods.

.matches() rather, unfortunately, trying to match the entire input, which is a clear violation of the definition of "matching regular expressions" (a regular expression can match anywhere in the input). The method you need to use is .find() .

This is the Braindead API, and unfortunately Java is not the only language that has such erroneous method names. Python also pleads guilty.

Also, you have a problem that \\b will detect at word boundaries, and # not part of the word. You need to use alternation that defines either the beginning of the input or a space.

Your code should look like this (not fully qualified classes):

 Pattern p = Pattern.compile("(^|\\s)#SP\\b", Pattern.CASE_INSENSITIVE); Matcher m = p.matcher("s #SP s"); if (m.find()) { System.out.println("Match!"); } 
+5
source

The regular expression "\\w*\\s*#sp\\w*\s*" will match 0 or more words, followed by 0 or more spaces, followed by #sp, followed by 0 or more words, followed by 0 or more spaces. My suggestion is to not use \ s * to break the words in your expression, use \ b instead.

 "(^|\b)#sp(\b|$)" 
+1
source

All Articles