How does Evernote Web Clipper analyze web pages so well?

I am trying to replicate the Evernote Web Clipper parsing capabilities in python for my own web clip projects. I am interested in extracting the main text only, nothing more.

I used the python Arc90 port:

https://github.com/buriy/python-readability

combined with the aaronsw great html2text library:

https://github.com/aaronsw/html2text

and it gives good results most of the time, but Evernote parses the main text much better.

Maybe someone can recommend a better approach or maybe tell me what Evernote does.

Thanks!

+6
source share

All Articles