Beautifulsoup, maximum recursion depth reached

Question

Beautifulsoup, maximum recursion depth reached

This is a procedure beautifulsoupthat captures content in all <p>html tags . After capturing content from some web pages, I get an error message indicating that the maximum recursion depth is exceeded.

def printText(tags):
    for tag in tags:
        if tag.__class__ == NavigableString:
            print tag,
        else:
            printText(tag)
    print ""
#loop over urls, send soup to printText procedure

Bottom of the track:

 File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 13, in printText
    if tag.__class__ == NavigableString:
RuntimeError: maximum recursion depth exceeded in cmp

+5

python beautifulsoup

yayu Apr 12 '12 at 6:01

source share

3 answers

printText() , -, NavigableString. NavigableString, . printText() , .

isinstance() if :

if isinstance(tag, basestring):

, :

print "recursing on", tag, type(tag)
printText(tag)

+5

Leonard Richardson 12 . '12 13:58

I had the same problem. If you have nested tags with a depth of about 480 levels, and you want to convert this tag to a / unicode string, you will get RuntimeError maximum recursion depth reached. Each level needs two nested method calls, and soon you will hit the default 1000 nested python calls. You can upgrade this level, or you can use this helper. It extracts all text from html and displays it in a preliminary environment:

def beautiful_soup_tag_to_unicode(tag):
    try:
        return unicode(tag)
    except RuntimeError as e:
        if not str(e).startswith('maximum recursion'):
            raise
        # If you have more than 480 level of nested tags you can hit the maximum recursion level
        out=[]
        for mystring in tag.findAll(text=True):
            mystring=mystring.strip()
            if not mystring:
                continue
            out.append(mystring)
        return u'<pre>%s</pre>' % '\n'.join(out)

0

guettli Aug 28 '12 at 9:31

source share

Ignacio Vazquez-Abrams · Accepted Answer · 2012-04-12T06:06:00+0000

You probably hit the line. Iterating over a line gives a string of length 1. Iterating over this 1-line string gives a string of length 1. Iterating over this 1-line string ...

Beautifulsoup, maximum recursion depth reached

More articles: