Is there a service / library (free or paid) that takes a piece of text and returns its language?
I need to go through a million blog posts and identify their languages.
I think this is the best!
https://code.google.com/p/language-detection/
I heard good things about langid.py .
langid.py
README Features:
FastPre-prepared in a large number of languages (currently 97)Insensitive to domain specific functions (e.g. HTML / XML markup)Separate .py file with minimal dependenciesDeployment as a web service
https://github.com/saffsd/langid.py