Parsing a table using regex - Java

I am parsing the following table of AWS instances:

 m1.small 1 1 1.7 1 x 160 $0.044 per Hour m1.medium 1 2 3.75 1 x 410 $0.087 per Hour m1.large 2 4 7.5 2 x 420 $0.175 per Hour m1.xlarge 4 8 15 4 x 420 $0.35 per Hour 

There is a file with such costs:

 input = new Scanner(file); String[] values; while (input.hasNextLine()) { String line = input.nextLine(); values = line.split("\\s+"); // <-- not what I want... for (String v : values) System.out.println(v); } 

However, this gives me:

 m1.small 1 1 1.7 1 x 160 $0.044 per Hour 

which is not what I want ... The corrected parsing of the values (with the correct regular expression) will look like this:

 ['m1.small', '1', '1', '1.7', '1 x 160', '$0.044', 'per Hour'] 

What will regex be to get the right result? It can be assumed that the table will always have the same template.

+7
java string regex parsing
source share
3 answers

Divide by one extra space. And spaces should appear in the context below.

DIGITAL - SPACES - NOT "x"

or

NOT "x" - SPACES - NUMBERS

  values = line.split("(?<=\\d)\\s+(?=[^x])|(?<=[^x])\\s+(?=\\d)"))); 
+4
source share

Try this fiddle https://regex101.com/r/sP6zW5/1

([^\s]+)\s+(\d+)\s+(\d+)\s+([\d\.]+)\s+(\d+ x \d+)\s+(\$\d+\.\d+)\s+(per \w+)

matches the text, and the group is your list.

I think using split in your case is too complicated. If the text is always the same. Just like the reverse of string formatting.

+5
source share

If you want to use regex, you must do this:

  String s = "m1.small 1 1 1.7 1 x 160 $0.044 per Hour"; String spaces = "\\s+"; String type = "(.*?)"; String intNumber = "(\\d+)"; String doubleNumber = "([0-9.]+)"; String dollarNumber = "([$0-9.]+)"; String aXb = "(\\d+ x \\d+)"; String rest = "(.*)"; Pattern pattern = Pattern.compile(type + spaces + intNumber + spaces + intNumber + spaces + doubleNumber + spaces + aXb + spaces + dollarNumber + spaces + rest); Matcher matcher = pattern.matcher(s); while (matcher.find()) { String[] fields = new String[] { matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4), matcher.group(5), matcher.group(6), matcher.group(7) }; System.out.println(Arrays.toString(fields)); } 

Notice how I ripped the regex for reading. (As one long line, it is difficult to read / maintain). There is another way to do this. Since you know which fields are split, you can just do this simple split and build a new array with combined values:

  String[] allFields = s.split("\\s+"); String[] result = new String[] { allFields[0], allFields[1], allFields[2], allFields[3], allFields[4] + " " + allFields[5] + " " + allFields[6], allFields[7], allFields[8] + " " + allFields[9] }; System.out.println(Arrays.toString(result)); 
+4
source share

All Articles