I am trying to parse transaction letters from my (German) bank. I would like to extract all the numbers from the next line, which turned out to be more complicated than I thought. Option 2 does almost what I want. Now I want to change it to capture, for example. 80.
My first attempt is option 1, which returns only garbage. Why does it return so many blank lines? It should always have at least the number from the first \ d +, no?
Option 3 works (or at least works as expected), so I somehow answer my question. I guess I basically knock my head about why option 2 doesn't work.
# -*- coding: utf-8 -*- import re my_str = """ Dividendengutschrift für inländische Wertpapiere Depotinhaber : ME Extag : 18.04.2013 Bruttodividende Zahlungstag : 18.04.2013 pro Stück : 0,9800 EUR Valuta : 18.04.2013 Bruttodividende : 78,40 EUR *Einbeh. Steuer : 20,67 EUR Nettodividende : 78,40 EUR Endbetrag : 57,73 EUR """ print re.findall(r'\d+(,\d+)?', my_str) print re.findall(r'\d+,\d+', my_str) print re.findall(r'[-+]?\d*,\d+|\d+', my_str)
Exit
['', '', '', '', '', '', ',98', '', '', '', '', ',40', ',67', ',40', ',73'] ['0,9800', '78,40', '20,67', '78,40', '57,73'] ['18', '04', '2013', '18', '04', '2013', '0,9800', '18', '04', '2013', '78,40', '20,67', '78,40', '57,73']