Java: regular expression escape Regular Expression

This sample data is returned by the web service.

200.6 California, USA

I want to split them using split(",") and tried to see the result with simple code.

 String loc = "200,6,\"California, USA\""; String[] s = loc.split(","); for(String f : s) System.out.println(f); 

Unfortunately, this is the result.

 200 6 "California USA" 

Expected result should be

 200 6 "California, USA" 

I tried different regular expressions and no luck. Is it possible to avoid this regular expression inside "" ?

UPDATE 1 : Added C # Code

UPDATE 2 : C # Remote Code

+7
source share
4 answers
 ,(?=(?:[^"]|"[^"]*")*$) 

This is the regular expression you want (To put it in the split function, you need to escape the quotes in the string)

Explanation

You need to find everything, not in quotation marks. That is, you need a lookahead ( http://www.regular-expressions.info/lookaround.html ) to find out if your current matching comma is in quotation marks or out.

To do this, we use lookahead to basically ensure that the current match matches, followed by the EVEN number of characters' '' (which means that it lies outside the quotes)

So, (?:[^"]|"[^"]*")*$ Means a match only if there are characters without quotes to the end OR a pair of quotes with anything in between

(?=(?:[^"]|"[^"]*")*$) will look for the above match

,(?=(?:[^"]|"[^"]*")*$) and finally it will match all ',' with the above view

+3
source

A simpler solution would be to use an existing library such as OpenCSV to analyze your data. This can be done in two lines using this library:

 CSVParser parser = new CSVParser(); String [] data = parser.parseLine(inputLine); 

This will become especially important if you have more complex CSV values ​​in the future (multi-line values ​​or values ​​with escaped quotes inside the element, etc.). If you do not want to add a dependency, you can always use your code as a link (although it is not based on RegEx)

+2
source

If there is a good lexer / parser library for Java, you can define lexer as the following pseudo-lexer code:

 Delimiter: , Item: ([^,"]+) | ("[^,"]+") Data: Item Delimiter Data | Item 

How lexers work, it starts by defining a top-level token (in this case, Data) and tries to form tokens from the string until it can or until the string disappears. Thus, in the case of your string, the following will happen:

  • I want to make Data from 200.6, "California, USA."
  • I can make data from element, delimiter and data.
  • I looked - 200 is an element, and then a separator, so I can fake it and continue.
  • I want to make data from 6, "California, USA."
  • I can make data from element, delimiter and data.
  • I looked - 6 is the element, and then the separator, so I can fake it and continue.
  • I want to make data from "California, USA."
  • I can make data from element, delimiter and data.
  • I looked - "California, USA" is an element, but after it I do not see the separator, so try something else.
  • I can make Data from an element.
  • I looked - "California, USA" is an element, so I can fake it and continue.
  • The string is empty. I'm done. Here are your tokens.

(I found out how lexers work from the PLY manual, Python lexer / parser: http://www.dabeaz.com/ply/ply.html )

0
source

Hey. Try this expression.

 public class Test { /** * @param args */ public static void main(String[] args) { String loc = "200,6,\"Paris, France\""; String[] str1 =loc.split(",(?=(?:[^\"]|\"[^\"]*\")*$)"); for(String tmp : str1 ){ System.out.println(tmp); } } } 
0
source

All Articles