Python BeautifulSoup to clean csv

I am trying to clear some simple dictionary data from an html page. So far, I can print all the words that I need in the IDE. The next step was to pass the words to the array. My last step was to save the array as a csv file ... When I run my code, it seems to stop receiving information after the 1309th or 1311th words, although, I believe, there will be more than 1 million on the web page. I am stuck and will be very grateful for any help. Thanks you

from bs4 import BeautifulSoup
from urllib import urlopen
import csv

html = urlopen('http://www.mso.anu.edu.au/~ralph/OPTED/v003/wb1913_a.html').read()

soup = BeautifulSoup(html,"lxml")

words = []

for section in soup.findAll('b'):

    words.append(section.renderContents())

print ('success')
print (len(words))

myfile = open('A.csv', 'wb')
wr = csv.writer(myfile)
wr.writerow(words)

enter image description here

+4
source share
2 answers

( 11616 ), , beautifulsoup4 lxml. :

pip install --upgrade beautifulsoup4
pip install --upgrade lxml

, .

+1

, , . , ? , ?

yield.

def tokenize(soup_):
    for section in soup_.findAll('b'):
        yield section.renderContents()

, , section.renderContents() , csv .

0

All Articles