Find gender from name

Recently, I came across a strange but interesting question. The questions are as follows: It is necessary to write a program that can give gender as a result based on the name. Example: INPUT β†’ John Michael Britney OUTPUT β†’ male male female

So this is the result that I expect. I tried to solve a lot, but I really could not crack it. I will be very grateful to this site for giving me the opportunity to share this issue.

In fact, this is set in the programming contest as a problem with the flyer, so I thought it could be programmed.

+7
language-agnostic
source share
8 answers

You cannot do it algorithmically: you need a database to do it statistically. This SO> question points to many such resources available. Do you know that you will have many MANY mistakes - either Korean Kim (men) or North Europeans (women) can look very annoyed, for example :-)

+9
source share

I also use time to solve this. My first approach was to use lists of approved names, we have the ones in Denmark where I come from, but I quickly realized that there are only a few countries. In addition, I received feedback that the probabilistic assumption would be much more functional, and also that the country or language identifier could be filtered. Then I rebuilt using user datasets from social networks, which actually work quite well.

You can check it out at http://genderize.io

A simple example:

http://api.genderize.io?name=kim {"name":"kim","gender":"female","probability":"0.91","count":687} http://api.genderize.io?name=kim&country_id=dk {"name":"kim","gender":"male","probability":"1.00","count":17,"country_id":"dk"} 
+6
source share

Do not give up.

I would take a statistical approach ... you need to get an extensive database of names that actually has information on gender issues ... then teach your program to learn from this data set.

The fact is that you need a third variable for correlation. Something like country of origin, ethnicity, etc. Will further aggravate your chances. You really need this third "clue" ...

+3
source share

How about interacting with a computer as a third key.

You may have a click map, for example http://css-tricks.com/tracking-clicks-building-a-clickmap-with-php-and-jquery/

Depending on where the user clicks, you can determine reasonable statistics for men and women. This will be used if unknown in the database.

Here is the Wikipedia on the topic "Gender_HCI":

β€œLarge displays helped narrow the gender gap when navigating virtual environments. With smaller displays, men's performance was better than women. With larger displays, women improved and men's performance was not negatively affected.”

So, I have a small box and the time it takes to click it ....

+2
source share

The statistical approach works very well, depending on the country, the accuracy is 95% or 99% + with a few exceptions (Chinese names, Korean names).

Check out the GendRE API http://namsor.com/api

It automatically recognizes the culture behind the name to use the appropriate dictionary (for example, Andrea Rossini - man, Andrea Parker - woman, etc.).

+2
source share

I have done this before - it works easily and well in 90% of cases when applied to the correct scenario.

You need to get a database of names and regular gender from somewhere. Then it is trivial to search the database.

Some names (e.g. Andy) are usually associated with any gender. Thus, you will need at least three gender values ​​- male / female / unknown.

+1
source share

Check out WolframAlpha.com. They have a webservice API, but it's a little expensive ...

http://products.wolframalpha.com/api/pricing.html

+1
source share

Usually names ending in a, e, i, o, u are female names. They may be inaccurate compared to the statistics API, but they are easy to implement.

0
source share

All Articles