How does Evernote Web Clipper analyze web pages so well?

Question

How does Evernote Web Clipper analyze web pages so well?

I am trying to replicate the Evernote Web Clipper parsing capabilities in python for my own web clip projects. I am interested in extracting the main text only, nothing more.

I used the python Arc90 port:

https://github.com/buriy/python-readability

combined with the aaronsw great html2text library:

https://github.com/aaronsw/html2text

and it gives good results most of the time, but Evernote parses the main text much better.

Maybe someone can recommend a better approach or maybe tell me what Evernote does.

Thanks!

+6

python web-scraping screen-scraping evernote

vgoklani Feb 11 '13 at 10:10

source share

No one has answered this question yet.

See similar questions:

6

How to detect the main article tag, for example, the Evernote clipper

or similar:

2005

How do I parse a string in float or int?

866

How does the @property decorator work?

863

How to process XML in Python?

798

How does Python super () work with multiple inheritance?

353

Headless browser and scraping - solutions

eleven

How to extract useful and useful content from web pages?

5

Print HTML text of selenium web element in Python

0

Parse's answer in Quora containing code

0

Unable to Install Google Evernote Web Clipper Extension

0

Is there a way to use readability (a text extraction algorithm) and python's own algorithm to extract links from text?

How does Evernote Web Clipper analyze web pages so well?

More articles: