How to detect the main article tag, for example, the Evernote clipper

When I tried with the Evernote clipper extension , I see a very useful feature. When I clicked on the β€œarticle”, it gives me the really correct main content of the page. See the result when I used Evernote Clipper with the page https://developer.chrome.com/extensions/api_index extract article in a page

I looked at the main article, in which all excerpts loom, on several pages, the article is infact, extracted from the first tag of the article. However, evernote clipper still works well with pages, does not use this tag.

I wonder how the Evernote clipper can do this. Is there support js library to detect the main tag containing the main content of the pages. Could you give me some tips to do this.

Thank you in advance!

+6
source share
1 answer

As far as I know, universal js lib does not exist. The Evernote tool uses its own method to extract β€œinteresting” content from a web page. You can access the Evernote clipper code to try to understand the process.

On my mac, the path to the chrome extension is:

~ / Library / Application Support / Google / Chrome / Default / Extensions / pioclpoplcdbaefihamjohnefbikjilc / 6.2_0 /

Here is another tool that works pretty much the same: https://www.readability.com/

You can also check this topic: Which algorithm uses Readability to extract text from URLs?

or do a google search for terms like 'content extract js lib' for example. (Found this one: https://github.com/hatena/extract-content-javascript )

Hope this helps

+7
source

All Articles