Common, computer-generated syntactic lists of common names?

I need a list of common names for people like Bill, Gordon, Jane, etc. Is there some kind of free list of many famous names, instead of having to enter them? Something that I can easily parse with a program to populate an array, for example?

They don’t bother me:

  • Knowing whether the name is male or female (or both)
  • If the data set has a whole bunch of false positives
  • If there are names that are missing from it, it is obvious that such a data set will not be completed.
  • If there are "duplicates", i.e. I don’t care if the dataset lists “Bill” and “William” and “Billy” as different names. I would rather have more data than less
  • I don’t care to know the name’s popularity

I know Wikipedia has a list of the most popular given names , but all this is on an HTML page and is associated with the awful wiki syntax. Is there a better way to get some example data like this without escaping scrape wikipedia?

+7
dataset
source share
3 answers

That should be enough for you to start, I thought.

+25
source share

Social Security Administration - In addition to files with 1000 names

Above is a complete list of the first names used in the USA. Mail files contain data of national and state level by year of birth in CSV format. It includes the number of occurrences (minimum 5) and gender. For example, the national file for 2010 includes 33,838 names of children.

+6
source share

You can easily use the Wikipedia API ( http://en.wikipedia.org/w/api.php ) to get a list of pages in a specific category that looks like Category: These names are where you want to start.

http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmnamespace=0&cmlimit=500&cmtitle=Category:Given_names 

Part of the result of this URL is as follows:

  <cm pageid="5797824" ns="0" title="Abdou" /> <cm pageid="5797863" ns="0" title="Abdu" /> <cm pageid="859035" ns="0" title="Abdul Aziz" /> <cm pageid="6504818" ns="0" title="Abdul Qadir" /> 

Look at the API and select the appropriate format and request parameters and check the categories.

PS BTW, the wiki text from the page you are linked to contains names in a form that can be easily retrieved using regexp ... Like the link names on the displayed HTML page, there is a "(name)" attached to the name.

+5
source share

All Articles