How to parse html table with python and beautifulsoup and write in csv

I am trying to parse an html page and select values ​​for currencies and write csv. I have the following code:

#!/usr/bin/env python import urllib2 from BeautifulSoup import BeautifulSoup contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily" soup = BeautifulSoup(urllib2.urlopen(contenturl).read()) table = soup.find('div', attrs={'class': 'content'}) rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') for td in cols: text = td.find(text=True) + ';' print text, print 

The problem is that I do not know how to get only the values ​​for the currency. I tried some regex, like '^ [0-9] {3}' - start with 3 digits, but that doesn't work.

+7
source share
1 answer

You will be much better off choosing specific cells in the table. The td cells with the cell_c class contain data that interests you, and the latter is always the exchange rate:

 rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') if 'cell_c' in cols[0]['class']: # currency row digital_code, letter_code, units, name, rate = [c.text for c in cols] print digital_code, letter_code, units, name, rate 

With the data in separate variables, you can now turn the text into decimal numbers, save them in a database, whatever.

+9
source