I also use time to solve this. My first approach was to use lists of approved names, we have the ones in Denmark where I come from, but I quickly realized that there are only a few countries. In addition, I received feedback that the probabilistic assumption would be much more functional, and also that the country or language identifier could be filtered. Then I rebuilt using user datasets from social networks, which actually work quite well.
You can check it out at http://genderize.io
A simple example:
http://api.genderize.io?name=kim {"name":"kim","gender":"female","probability":"0.91","count":687} http://api.genderize.io?name=kim&country_id=dk {"name":"kim","gender":"male","probability":"1.00","count":17,"country_id":"dk"}
Stromgren
source share