I grab the HTML table with this code:
import csv
import urllib2
from bs4 import BeautifulSoup
with open('listing.csv', 'wb') as f:
writer = csv.writer(f)
for i in range(39):
url = "file:///C:/projects/HTML/Export.htm".format(i)
u = urllib2.urlopen(url)
try:
html = u.read()
finally:
u.close()
soup=BeautifulSoup(html)
for tr in soup.find_all('tr')[2:]:
tds = tr.find_all('td')
row = [elem.text.encode('utf-8') for elem in tds]
writer.writerow(row)
Everything works fine, but I'm trying to grab the url of column 9 HREF. It currently gives me a txt value, but not a URL.
Also, do I have two tables in my HTML, anyway, to skip the first table and just create the csv file using the second table?
Any help is greatly appreciated as I am new to Python and need this for a project. I automate the daily conversion.
Many thanks!
source
share