Shorten the text and save only important sentences

The German website nandoo.net offers the opportunity to shorten a news article. If you change the percentage using the slider, the text changes, and some sentences are not taken into account.

You can see it in action here:

http://www.nandoo.net/read/article/299925/

The news article is on the left side and the tags are marked. The slider is at the top of the second column. The more you move the slider to the left, the shorter the text will be.

How can you suggest something like this? Are there any algorithms you can use to achieve this?

My idea was that their algorithm counts the number of tags and nouns in a sentence. Then sentences with the least number of tags / nouns are not taken into account.

Could this be true? Or do you have another idea?

I hope you help me. Thanks in advance!

+4
source share
2 answers

Usually you want to keep sentences that have more unique words for this article.

That is, the more “general” the sentence, the less he describes this particular article.

The usual way to do this is Bayesian analysis, similar to a spam filter. First, determine which words appear more often in the article than you expected, then find sentences that contain these words.

+2
source

This is a hot topic of research in computational linguistics. A shallow approach using Bayesian filtering is unlikely to produce excellent results - but you probably do not need perfect results.

In CL, rule 80-20 quickly becomes rule 95-5, so if you are happy with what you can achieve with the help of small methods, skip this answer.

If you want to know if you can improve your results, you can try to find some of the best resources. The task you are talking about is called “text compilation” in the research community, and it has its own web page , which is hopelessly out of date. Mani and Maybury (1999) is probably a good review (I have not read it myself), but is also quite outdated. Martin Hassel ’s later story on this topic, as well as quite comprehensive, including independent of the language (reading: statistical, that is, small) methods.

As always, Google will also be able to help you. Just find a text summary .

+3
source

All Articles