Regex Question - one or more spaces outside the quotation enclosed in a block of text

I want to replace any occurrence of more than one space with one space, but not perform any actions in the text between quotation marks.

Is there a way to do this using Java regex? If so, can you try or give me a hint?

+2
java regex quotes
source share
7 answers

Here's another approach that lookahead uses to determine that all quotes after the current position are paired.

text = text.replaceAll(" ++(?=(?:[^\"]*+\"[^\"]*+\")*+[^\"]*+$)", " "); 

If necessary, viewing can be adapted to handle shielded quotes within the specified sections.

+4
source share

When trying to match something that might be contained in something else, it might be useful to create a regular expression that matches both, for example:

 ("[^"\\]*(?:\\.[^"\\]*)*")|( +) 

This will match the quoted string or two or more spaces. Since the two expressions are combined, it will match quotation marks or OR or two spaces, but not spaces inside quotation marks. Using this expression, you will need to examine each match to determine if it is a string quote or two or more spaces, and act accordingly:

 Pattern spaceOrStringRegex = Pattern.compile( "(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|( +)" ); StringBuffer replacementBuffer = new StringBuffer(); Matcher spaceOrStringMatcher = spaceOrStringRegex.matcher( text ); while ( spaceOrStringMatcher.find() ) { // if the space group is the match if ( spaceOrStringMatcher.group( 2 ) != null ) { // replace with a single space spaceOrStringMatcher.appendReplacement( replacementBuffer, " " ); } } spaceOrStringMatcher.appendTail( replacementBuffer ); 
+2
source share

text between quotation marks: are quotes enclosed within one line or several lines?

0
source share

Label it and select one space between tokens. There was a quick google for the "jken tokenizer that processes quotes": this link

Ymmv

edit: SO did not like this link. Here's the google search link: google . This was the first result.

0
source share

Personally, I don't use Java, but this RegExp could do the trick:

 ([^\" ])*(\\\".*?\\\")* 

Attempting an expression using RegExBuddy, it generates this code, looks great to me:

 try { Pattern regex = Pattern.compile("([^\" ])*(\\\".*?\\\")*", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE); Matcher regexMatcher = regex.matcher(subjectString); while (regexMatcher.find()) { for (int i = 1; i <= regexMatcher.groupCount(); i++) { // matched text: regexMatcher.group(i) // match start: regexMatcher.start(i) // match end: regexMatcher.end(i) // I suppose here you must use something like // sstr += regexMatcher.group(i) + " " } } } catch (PatternSyntaxException ex) { // Syntax error in the regular expression } 

At least in Python, it works fine:

 import re text = """ este es un texto de prueba "para ver como se comporta " la funcion sobre esto "para ver como se comporta " la funcion sobre esto "o sobre otro" lo q sea """ ret = "" print text reobj = re.compile(r'([^\" ])*(\".*?\")*', re.IGNORECASE) for match in reobj.finditer(text): if match.group() <> "": ret = ret + match.group() + "|" print ret 
0
source share

After you have analyzed the cited content, run it in the rest, in volume or in parts, if necessary:

 String text = "ABC DEF GHI JKL"; text = text.replaceAll("( )+", " "); // text: "ABC DEF GHI JKL" 
0
source share

Jeff, you are on the right track, but there are a few mistakes in your code: (1) You forgot to avoid quotation marks inside classes with negative characters; (2) The pairs within the first capture group must have a non-exciting variety; (3) If the second set of captured parsers does not participate in the match, group(2) returns null, and you do not check for it; and (4) If you check for two or more spaces in the regular expression instead of one or more, you will not need to check the length of the match later. Here's the revised code:

 import java.util.regex.*; public class Test { public static void main(String[] args) throws Exception { String text = "blah blah \"boo boo boo\" blah blah"; Pattern p = Pattern.compile( "(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|( +)" ); StringBuffer sb = new StringBuffer(); Matcher m = p.matcher( text ); while ( m.find() ) { if ( m.group( 2 ) != null ) { m.appendReplacement( sb, " " ); } } m.appendTail( sb ); System.out.println( sb.toString() ); } } 
0
source share

All Articles