How to get a database of all the names of people (or at least English common)?

Question

How to get a database of all the names of people (or at least English common)?

I am developing an application that is supposed to extract the names of people from short texts.

What is the best way to do this? is there a name database where i can check where is the name? the fact that the text is short may not be as intense in terms of processing needs.

Any ideas?

Thanks,

There

+4

string

Tam Nov 14 '09 at 10:23

source share

5 answers

If you are on * nix, try looking at /usr/share/dict/propernames . Mac OS X has this, and I think that at least Ubuntu does too.

You can use this with grep :

 grep -f /usr/share/dict/propernames short_text.txt

+3

jtbandes Nov 14 '09 at 10:26

source share

I found this link: Extract people names from RSS feeds using WordNet

+3

Pierre Nov 14 '09 at 10:26

source share

How about the US Census Bureau genealogy

+1

softveda Nov 14 '09 at 10:36

source share

Get the name dataset:
I created a collection of data sets for such tasks. Here you can use my datasets: https://mbejda.imtqy.com . All of them are in CSV format. Names are classified by race and gender.

Named Object Recognizer:
Explore OpenNLP or StanfordNLP for name resolver and retrieval.

+1

mbejda Dec 04 '15 at 12:59

source share

João Silva · Accepted Answer · 2009-11-14T22:26:05+0000

You can use the statistical Named Entity Recognizer (NER), such as Stanford NER , or LingPipe . These are machine learning based recognizers that do not require huge dictionaries of names as input.

Alternatively, you can get a list of usernames from the Internet (there are many) and use the Aho-Corasick string search algorithm to efficiently extract names from a list from text.

How to get a database of all the names of people (or at least English common)?

More articles: