I don’t know much about html ... How to remove only text from a page? For example, if the html page reads like:
<meta name="title" content="How can I make money at home online? No gimmacks please? - Yahoo! Answers">
<title>How can I make money at home online? No gimmicks please? - Yahoo! Answers</title>
I just want to extract this.
How can I make money at home online? No gimmicks please? - Yahoo! Answers
I am using the re function:
def striphtml(data):
p = re.compile(r'<.*?>')
return p.sub(' ',data)
but still it does not do what I intend to do.?
The above function is called:
for lines in filehandle.readlines():
myFile.write(lines)
lines = striphtml(lines)
content.append(lines)
source
share