Convert currency to numbers in Python

I just found out from Number Format as a Currency in Python that the Python babel module provides babel.numbers.format_currency for formatting numbers as a currency. For instance,

 from babel.numbers import format_currency s = format_currency(123456.789, 'USD', locale='en_US') # u'$123,456.79' s = format_currency(123456.789, 'EUR', locale='fr_FR') # u'123\xa0456,79\xa0\u20ac' 

How about the opposite, from currency to numbers, for example $123,456,789.00123456789 ? babel provides babel.numbers.parse_number to parse local numbers, but I did not find something like parse_currency . So, what is the ideal way to parse local currency into numbers?


I went through Python: removing characters except numbers from a string .

 # Way 1 import string all=string.maketrans('','') nodigs=all.translate(all, string.digits) s = '$123,456.79' n = s.translate(all, nodigs) # 12345679, lost `.` # Way 2 import re n = re.sub("\D", "", s) # 12345679 

He does not care about the decimal separator . .


Remove all non-numeric characters except . , from a string (see here ),

 import re # Way 1: s = '$123,456.79' n = re.sub("[^0-9|.]", "", s) # 123456.79 # Way 2: non_decimal = re.compile(r'[^\d.]+') s = '$123,456.79' n = non_decimal.sub('', s) # 123456.79 

It processes the decimal separator . .


But the above solutions do not work, for example, when using

 from babel.numbers import format_currency s = format_currency(123456.789, 'EUR', locale='fr_FR') # u'123\xa0456,79\xa0\u20ac' new_s = s.encode('utf-8') # 123 456,79 € 

As you can see, the currency format is changing. What is the ideal way to share currency in numbers?

+6
source share
1 answer

Using babel

The babel documentation notes that parsing a number is not fully implemented , but they did a great job to get currency information in the library, you can use get_currency_name() and get_currency_symbol() to get the currency details, as well as all other get_... functions. to get the normal part numbers (decimal point, minus sign, etc.).

Using this information, you can exclude from the currency line data on the currency (name, sign) and grouping (for example , in the USA). Then you change the decimal numbers to those used in C locale ( - for minus and . For decimal point).

As a result of this code (I added an object to save some data, which may come in handy in further processing):

 import re, os from babel import numbers as n from babel.core import default_locale class AmountInfo(object): def __init__(self, name, symbol, value): self.name = name self.symbol = symbol self.value = value def parse_currency(value, cur): decp = n.get_decimal_symbol() plus = n.get_plus_sign_symbol() minus = n.get_minus_sign_symbol() group = n.get_group_symbol() name = n.get_currency_name(cur) symbol = n.get_currency_symbol(cur) remove = [plus, name, symbol, group] for token in remove: # remove the pieces of information that shall be obvious value = re.sub(re.escape(token), '', value) # change the minus sign to a LOCALE=C minus value = re.sub(re.escape(minus), '-', value) # and change the decimal mark to a LOCALE=C decimal point value = re.sub(re.escape(decp), '.', value) # just in case remove extraneous spaces value = re.sub('\s+', '', value) return AmountInfo(name, symbol, value) #cur_loc = os.environ['LC_ALL'] cur_loc = default_locale() print('locale:', cur_loc) test = [ (n.format_currency(123456.789, 'USD', locale=cur_loc), 'USD') , (n.format_currency(-123456.78, 'PLN', locale=cur_loc), 'PLN') , (n.format_currency(123456.789, 'PLN', locale=cur_loc), 'PLN') , (n.format_currency(123456.789, 'IDR', locale=cur_loc), 'IDR') , (n.format_currency(123456.789, 'JPY', locale=cur_loc), 'JPY') , (n.format_currency(-123456.78, 'JPY', locale=cur_loc), 'JPY') , (n.format_currency(123456.789, 'CNY', locale=cur_loc), 'CNY') , (n.format_currency(-123456.78, 'CNY', locale=cur_loc), 'CNY') ] for v,c in test: print('As currency :', c, ':', v.encode('utf-8')) info = parse_currency(v, c) print('As value :', c, ':', info.value) print('Extra info :', info.name.encode('utf-8') , info.symbol.encode('utf-8')) 

The result looks promising (in the US locale):

 $ export LC_ALL=en_US $ ./cur.py locale: en_US As currency : USD : b'$123,456.79' As value : USD : 123456.79 Extra info : b'US Dollar' b'$' As currency : PLN : b'-z\xc5\x82123,456.78' As value : PLN : -123456.78 Extra info : b'Polish Zloty' b'z\xc5\x82' As currency : PLN : b'z\xc5\x82123,456.79' As value : PLN : 123456.79 Extra info : b'Polish Zloty' b'z\xc5\x82' As currency : IDR : b'Rp123,457' As value : IDR : 123457 Extra info : b'Indonesian Rupiah' b'Rp' As currency : JPY : b'\xc2\xa5123,457' As value : JPY : 123457 Extra info : b'Japanese Yen' b'\xc2\xa5' As currency : JPY : b'-\xc2\xa5123,457' As value : JPY : -123457 Extra info : b'Japanese Yen' b'\xc2\xa5' As currency : CNY : b'CN\xc2\xa5123,456.79' As value : CNY : 123456.79 Extra info : b'Chinese Yuan' b'CN\xc2\xa5' As currency : CNY : b'-CN\xc2\xa5123,456.78' As value : CNY : -123456.78 Extra info : b'Chinese Yuan' b'CN\xc2\xa5' 

And it still works in different locales (Brazil is notable for using a comma as a decimal mark):

 $ export LC_ALL=pt_BR $ ./cur.py locale: pt_BR As currency : USD : b'US$123.456,79' As value : USD : 123456.79 Extra info : b'D\xc3\xb3lar americano' b'US$' As currency : PLN : b'-PLN123.456,78' As value : PLN : -123456.78 Extra info : b'Zloti polon\xc3\xaas' b'PLN' As currency : PLN : b'PLN123.456,79' As value : PLN : 123456.79 Extra info : b'Zloti polon\xc3\xaas' b'PLN' As currency : IDR : b'IDR123.457' As value : IDR : 123457 Extra info : b'Rupia indon\xc3\xa9sia' b'IDR' As currency : JPY : b'JP\xc2\xa5123.457' As value : JPY : 123457 Extra info : b'Iene japon\xc3\xaas' b'JP\xc2\xa5' As currency : JPY : b'-JP\xc2\xa5123.457' As value : JPY : -123457 Extra info : b'Iene japon\xc3\xaas' b'JP\xc2\xa5' As currency : CNY : b'CN\xc2\xa5123.456,79' As value : CNY : 123456.79 Extra info : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5' As currency : CNY : b'-CN\xc2\xa5123.456,78' As value : CNY : -123456.78 Extra info : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5' 

It should be noted that babel has some encoding issues. This is because locale-data files (in locale-data ) themselves use a different encoding. If you work with currencies that you are familiar with, this should not be a problem. But if you try unfamiliar currencies, you may run into problems (I just found out that Poland uses iso-8859-2 , not iso-8859-1 ).

+3
source

All Articles