Regular expression to identify all numbers in all localization formats

I scan text with a Scanner object, say lineScanner . Here are the announcements:

 String myText= "200,00/100,00/28/65.36/21/458,696/25.125/4.23/6.3/4,2/659845/4524/456,65/45/23.495.254,3"; Scanner lineScanner = new Scanner(myText); 

With this Scanner I would like to find the first BigDecimal , and then the second, etc. I have stated that BIG_DECIMAL_PATTERN appropriate for any occasion.

Here are the rules that I defined:

  • Thousands of separators always follow exactly 3 digits
  • After the decimal point, there are always exactly 1 or 2 digits.
  • If the thousands separator is a comma, so the decimal point is a dot symbol and vice versa
  • Thousands separator is optional, as the decimal part of a number

 String nextBigDecimal = lineScanner.findInLine(BIG_DECIMAL_PATTERN); 

Now, here is the BIG_DECIMAL_PATTERN I declared:

 private final String BIG_DECIMAL_PATTERN= "\\d+(\\054\\d{3}+)?(\\056\\d{1,2}+)?|\\d+(\\056\\d{3}+)?(\\054\\d{1,2}+)?)"; 

\\054 is the octal representation of ASCII ","

\\056 is the octal representation of ASCII "."

My problem is that it does not work, because when a sample of the first part is found, the second part (after | ) is not checked and in my example the first match will be 200 , not 200,00 . So I can try the following:

 private final String BIG_DECIMAL_PATTERN=\\d+([.,]\\d{3}+)?([,.]\\d{1,2}+)? 

But there is a new problem: the comma and period are not exceptional, I mean, if one of them is a thousands separator, then the decimal point must be different.

Thanks for the help.

+4
source share
2 answers

Can you do a single or regular expression? For instance. sort of:

 private final String BIG_DECIMAL_PATTERN = "\\d+((\\.\\d{3}+)?(,\\d{1,2}+)?|(,\\d{3}+)?(\\.\\d{1,2}+)?)" 

Note. I have not tested whether your regular expression works, and suspect that this is not the best way to achieve what you are trying to do. Everything I do to start and run suggests that you can try using (regex1|regex2) , where regex1 are regex2 and regex2 are commas and then dots.

+1
source

I believe the option of your second RegEx will work for you. Consider this regular expression:

 ^\\d+(?:([.,])\\d{3})*(?:(?!\\1)[.,]\\d{1,2})?$ 

Live Demo: http://www.rubular.com/r/vHlEdBMhO9

Explanation:. This is done in order to first capture a comma or point in capture group # 1. And then later make sure that the same capture group # 1 is not displayed at the decimal point, using a negative result. Which, in other words, ensures that if the comma appears first, then the dot will appear later and vice versa.

+1
source

All Articles