How to parse html table with python and beautifulsoup and write in csv

Question

How to parse html table with python and beautifulsoup and write in csv

I am trying to parse an html page and select values for currencies and write csv. I have the following code:

#!/usr/bin/env python import urllib2 from BeautifulSoup import BeautifulSoup contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily" soup = BeautifulSoup(urllib2.urlopen(contenturl).read()) table = soup.find('div', attrs={'class': 'content'}) rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') for td in cols: text = td.find(text=True) + ';' print text, print

The problem is that I do not know how to get only the values for the currency. I tried some regex, like '^ [0-9] {3}' - start with 3 digits, but that doesn't work.

+7

python beautifulsoup

user2140323 Mar 6 '13 at 14:50

source share

1 answer

Martijn pieters · Accepted Answer · 2013-03-06T14:59:18+0000

You will be much better off choosing specific cells in the table. The td cells with the cell_c class contain data that interests you, and the latter is always the exchange rate:

 rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') if 'cell_c' in cols[0]['class']: # currency row digital_code, letter_code, units, name, rate = [c.text for c in cols] print digital_code, letter_code, units, name, rate

With the data in separate variables, you can now turn the text into decimal numbers, save them in a database, whatever.

How to parse html table with python and beautifulsoup and write in csv

More articles: