HTMLParser is not meant to stop. For this you want to use a streaming parser, for example xml.saxor xml.etree.cElementTree.
Is it really a problem to digest the whole HTML file? The expected use case is as follows:
extractor = Extractor()
... feed html to extractor using one or more .feed() calls ...
extractor.close()
if extractor.pre_resolved_dns_enabled:
...
else:
...
If this is really a problem, you can break the HTML input into pieces and pass them until you find your tag, for example:
html = ...the html to parse...
chunks = [ html[i:i+1024] for i in xrange(0, len(html), 1024) ]
extractor = Extractor()
for c in chunks:
if extractor.pre_resolved_dns_enabled:
break
extractor.feed(c)
extractor.close()
# check extractor.pre_resolved_dns_enabled
source
share