How to get a database of all the names of people (or at least English common)?

I am developing an application that is supposed to extract the names of people from short texts.

What is the best way to do this? is there a name database where i can check where is the name? the fact that the text is short may not be as intense in terms of processing needs.

Any ideas?

Thanks,

There

+4
source share
5 answers

You can use the statistical Named Entity Recognizer (NER), such as Stanford NER , or LingPipe . These are machine learning based recognizers that do not require huge dictionaries of names as input.

Alternatively, you can get a list of usernames from the Internet (there are many) and use the Aho-Corasick string search algorithm to efficiently extract names from a list from text.

+6
source

If you are on * nix, try looking at /usr/share/dict/propernames . Mac OS X has this, and I think that at least Ubuntu does too.

You can use this with grep :

 grep -f /usr/share/dict/propernames short_text.txt 
+3
source

How about the US Census Bureau genealogy

+1
source

Get the name dataset:
I created a collection of data sets for such tasks. Here you can use my datasets: https://mbejda.imtqy.com . All of them are in CSV format. Names are classified by race and gender.

Named Object Recognizer:
Explore OpenNLP or StanfordNLP for name resolver and retrieval.

+1
source

All Articles