Using a regular expression for a comma highlights a large number in the South Asian numbering system

I am trying to find a regular expression for a comma, separating a large number based on the South Asian numbering system .

A few examples:

  • 1,000,000 (Arabic) 10,00,000 (Indian / Hindu / South Asian)
  • 1,000,000,000 (Arabic) 100,00,00,000 (Indian / H / SA).

The comma pattern is repeated for every 7 digits. For example, 1,00,00,000,00,00,000 .

From Friddle's Mastering Regular Expressions, I have the following regular expression for the Arabic numbering system:

 r'(?<=\d)(?=(\d{3})+(?!\d))' 

For the Indian numbering system, I got the following expression, but it does not work for numbers with more than 8 digits:

 r'(?<=\d)(?=(((\d{2}){0,2}\d{3})(?=\b)))' 

Using the above pattern, I get 100000000,00,00,000 .

I am using the Python module re ( re.sub() ). Any ideas?

+7
source share
2 answers

Try the following:

 (?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d)) 

For example:

 >>> import re >>> inp = ["1" + "0"*i for i in range(20)] >>> [re.sub(r"(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))", ",", i) for i in inp] ['1', '10', '100', '1,000', '10,000', '1,00,000', '10,00,000', '1,00,00,000', '10,00,00,000', '100,00,00,000', '1,000,00,00,000', '10,000,00,00,000', '1,00,000,00,00,000', '10,00,000,00,00,000', '1,00,00,000,00,00,000', '10,00,00,000,00,00,000', '100,00,00,000,00,00,000', '1,000,00,00,000,00,00,000', '10,000,00,00,000,00,00,000', '1,00,000,00,00,000,00,00,000'] 

As a commented regular expression:

 result = re.sub( r"""(?x) # Enable verbose mode (comments) (?<=\d) # Assert that we're not at the start of the number. (?= # Assert that it possible to match: (\d{2}){0,2} # 0, 2 or 4 digits, \d{3} # followed by 3 digits, (\d{7})* # followed by 0, 7, 14, 21 ... digits, (?!\d) # and no more digits after that. ) # End of lookahead assertion.""", ",", subject) 
+6
source

I know that Tim answered your question, but assuming you start with numbers, not lines, did you think you need a regular expression? If the machine you are using supports the Indian language, you can simply use the local module:

 >>> import locale >>> locale.setlocale(locale.LC_NUMERIC, "en_IN") 'en_IN' >>> locale.format("%d", 10000000, grouping=True) '1,00,00,000' 

This interpretation session was copied from an Ubuntu system, but keep in mind that Windows systems may not support a suitable locale (at least mine does not work), therefore, although this is in some way a β€œcleaner” solution, depending on your environment, it may or may not be used.

+7
source

All Articles