How to automatically subtract user content?

I am launching a website that allows users to write a blog post, I would really like to generalize the written content and use it to fill in <meta name="description".../> -tag, for example.

What methods can I use to automatically summarize / describe the content of user-created content?
Are there any (preferably free) methods that have solved this problem?

(I saw other sites just copying the first 100 or so words, but that seems like a suboptimal solution to me.)

+4
source share
10 answers

Think of the task of generalization as a challenge to “select the most important sentences” from a document.

The method described in Automatically writing abstracts from HP Lun (1958) describes a naive method that really works well. Try to take a picture.

If your site is in Python encoding, this algorithm uses NLTK (Natural Language Toolkit) - this is a fun task.

+5
source

Make it predictable.

From the point of view of users, just using the first paragraph is not bad. In some cases, the use of any automation inevitably falls. Therefore, I suggest displaying the first paragraph (possibly truncated at some point) as a summary and suggesting the ability to override this optional field.

+4
source

I can try using a mechanical Turk or any number of other crowdsourcing options.

+1
source

Another verification item, SourceForge project, AutoSummary semantic analysis engine

+1
source

Apple Patent 6424362 - An automatic summary of document content contains sample code that may be useful ...

+1
source

This borders on artificial intelligence, so there will be no “easy” solution, but there are products that focus on this problem.

Look at the Copernican adder , for one.

0
source

Noun phrases usually tend to be important elements of a sentence. Choosing a sentence (s) with a high density of phrases can give a good resume. You can get noun phrases using POS tags.

For a good resume it is advisable that this is a meaningful proposal. Reading a broken sentence is slightly annoying.

0
source

Alternatively, when an author publishes an article, the author can indicate which keywords can be used in the description, which can then be automatically placed in the meta description tag.

0
source

All Articles