I wanted to remove all tags in the HTML file. For this, I used the python re-module. For example, consider a string <h1>Hello World!</h1>. I want to save only "Hello World!". To remove tags, I used re.sub('<.*>','',string). For obvious reasons, the result I get is an empty string (Regexp identifies the first and last angle brackets and removes everything between them). How do I solve this problem?
source
share