How to perform paragraph border detection within NLP?

I am working on extracting the names of people from various ads appearing in English newspapers.

However, I noticed that I need to determine the border of the declaration before retrieving the names that are present in it, since I only need the first occurring name to be extracted. I started working with Stanford NLP. I managed to extract the names. But I'm stuck in defining a paragraph border.

Is there a way to identify the border of a paragraph.

+4
source share
1 answer

This is a difficult problem, we are faced with the same problem in one of our projects. There are some theoretical works that help to determine in detail the scale of the problem and possible solutions. I will include them below.

We are still in the process of R&D, so we have few answers so far, but we are ready to share what we have and find when time moves ahead.

Here is one such article:

Automatic paragraph identification: Learning languages ​​and domains

Here is the github link for the ISCIBoost code that they use:

Open-source implementation of Boostexter (classifier based on Adaboost)

0
source

All Articles