Parse a table using BeautifulSoup and write to a text file

I need data from a table in a text file (output.txt) in this format: data1; data2; data3; Data4, .....

Celkova podlahova plocha bytu; 33m; Vytah; Ano; Nadzemne podlazie; Prizemne podlazie; .....; Forma vlastnictva; Osbne

In the " one line " separator " ; " (later exported to a csv file).

I'm starting .. Help, thanks.

from BeautifulSoup import BeautifulSoup import urllib2 import codecs response = urllib2.urlopen('http://www.reality.sk/zakazka/0747-003578/predaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever') html = response.read() soup = BeautifulSoup(html) tabulka = soup.find("table", {"class" : "detail-char"}) for row in tabulka.findAll('tr'): col = row.findAll('td') prvy = col[0].string.strip() druhy = col[1].string.strip() record = ([prvy], [druhy]) fl = codecs.open('output.txt', 'wb', 'utf8') for rec in record: line = '' for val in rec: line += val + u';' fl.write(line + u'\r\n') fl.close() 
+7
python beautifulsoup
source share
2 answers

You do not save each record while reading it. Try this where records are stored in records :

 from BeautifulSoup import BeautifulSoup import urllib2 import codecs response = urllib2.urlopen('http://www.reality.sk/zakazka/0747-003578/predaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever') html = response.read() soup = BeautifulSoup(html) tabulka = soup.find("table", {"class" : "detail-char"}) records = [] # store all of the records in this list for row in tabulka.findAll('tr'): col = row.findAll('td') prvy = col[0].string.strip() druhy = col[1].string.strip() record = '%s;%s' % (prvy, druhy) # store the record with a ';' between prvy and druhy records.append(record) fl = codecs.open('output.txt', 'wb', 'utf8') line = ';'.join(records) fl.write(line + u'\r\n') fl.close() 

It can be cleaned more, but I think this is what you want.

+11
source share

here's an alternative way to not BS, just for your task

 store=[] #to store your results url="""http://www.reality.sk/zakazka/0747-003578/predaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever""" page=urllib2.urlopen(url) data=page.read() for table in data.split("</table>"): if "<table" in table and 'class="detail-char' in table: for item in table.split("</td>"): if "<td" in item: store.append(item.split(">")[-1].strip()) print ','.join(store) 

Exit

 $ ./python.py Celková podlahová plocha bytu,33 m2,Výťah,Áno,Nadzemné podlažie,Prízemné podlažie,Stav,Čiastočná rekonštrukcia,Konštrukcia bytu,tehlová,Forma vlastníctva,osobné 
0
source share

All Articles