I am trying to clear some simple dictionary data from an html page. So far, I can print all the words that I need in the IDE. The next step was to pass the words to the array. My last step was to save the array as a csv file ... When I run my code, it seems to stop receiving information after the 1309th or 1311th words, although, I believe, there will be more than 1 million on the web page. I am stuck and will be very grateful for any help. Thanks you
from bs4 import BeautifulSoup
from urllib import urlopen
import csv
html = urlopen('http://www.mso.anu.edu.au/~ralph/OPTED/v003/wb1913_a.html').read()
soup = BeautifulSoup(html,"lxml")
words = []
for section in soup.findAll('b'):
words.append(section.renderContents())
print ('success')
print (len(words))
myfile = open('A.csv', 'wb')
wr = csv.writer(myfile)
wr.writerow(words)

source
share