When using the HTMLParser class in Python, is it possible to interrupt processing in the handle_* function? At the beginning of the processing, I get all the necessary data, so it seems that the waste continues to be processed. The following is an example of extracting metadata for a document.
from HTMLParser import HTMLParser class MyParser(HTMLParser): def handle_start(self, tag, attrs): in_meta = False if tag == 'meta': for attr in attrs: if attr[0].lower() == 'name' and attr[1].lower() == 'description': in_meta = True if attr[0].lower() == 'content': print(attr[1])
source share