Beautifulsoup, maximum recursion depth reached

This is a procedure beautifulsoupthat captures content in all <p>html tags . After capturing content from some web pages, I get an error message indicating that the maximum recursion depth is exceeded.

def printText(tags):
    for tag in tags:
        if tag.__class__ == NavigableString:
            print tag,
        else:
            printText(tag)
    print ""
#loop over urls, send soup to printText procedure

Bottom of the track:

 File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 13, in printText
    if tag.__class__ == NavigableString:
RuntimeError: maximum recursion depth exceeded in cmp
+5
source share
3 answers

You probably hit the line. Iterating over a line gives a string of length 1. Iterating over this 1-line string gives a string of length 1. Iterating over this 1-line string ...

+1
source

printText() , -, NavigableString. NavigableString, . printText() , .

isinstance() if :

if isinstance(tag, basestring):

, :

print "recursing on", tag, type(tag)
printText(tag)
+5

I had the same problem. If you have nested tags with a depth of about 480 levels, and you want to convert this tag to a / unicode string, you will get RuntimeError maximum recursion depth reached. Each level needs two nested method calls, and soon you will hit the default 1000 nested python calls. You can upgrade this level, or you can use this helper. It extracts all text from html and displays it in a preliminary environment:

def beautiful_soup_tag_to_unicode(tag):
    try:
        return unicode(tag)
    except RuntimeError as e:
        if not str(e).startswith('maximum recursion'):
            raise
        # If you have more than 480 level of nested tags you can hit the maximum recursion level
        out=[]
        for mystring in tag.findAll(text=True):
            mystring=mystring.strip()
            if not mystring:
                continue
            out.append(mystring)
        return u'<pre>%s</pre>' % '\n'.join(out)
0
source

All Articles