I am trying to replicate the Evernote Web Clipper parsing capabilities in python for my own web clip projects. I am interested in extracting the main text only, nothing more.
I used the python Arc90 port:
https://github.com/buriy/python-readability
combined with the aaronsw great html2text library:
https://github.com/aaronsw/html2text
and it gives good results most of the time, but Evernote parses the main text much better.
Maybe someone can recommend a better approach or maybe tell me what Evernote does.
Thanks!
source share