Detect if english text using python

Well, I knew that this question was asked several times, but I still could not fix it with an “affordable” solution. Hope you have any further ideas or ideas on how to discover my suggestions - this is English in python. Affordable Solution:

  • Language detector (in ruby ​​not in python: /)
  • Google Translate API v2 (no longer free, you need to pay 20 bucks a month while I do this project for academic purposes. Personality limit: 0 characters / day)
  • Language identification for python (source code not found, link below. Automatic-language-identification )
  • Enchant (is this not for python 2.7? Am I new to python in any guide? I'm sure this will be the one I need)
  • Wordnet from NLTK (I have no idea why “wordnet.synsets” is missing, and only “wordnet.Synset” is available. Does the sample code in the solution not work for me, as well as T_T, maybe the version problem again?)
  • Keep the English words in a list and compare if the word exists (yes, this is a bad approach, while sentences from twitter and .. did you know that: P)

WORKING DECISION

Finally, after a series of attempts, this is a working solution (alternative to the list above)

  • Wiktionary API (using Urllib2 and simplejson to parse it, then find if the key is -1, it means the word does not exist. Else it is english. Of course, to use twitter you need to pre-process your word without any special character, like @ #,?!, in order to find the key referencing here. The value of Simplejson and random key )
  • Reply from Dogukan Tufekci (Note) (Weakness: suppose if a sentence less than 20 characters long should install PyEnchant, or it will return UNKNOWN. As long as PyEnchant does not support Python 2.7, it cannot install and not work less than 20 characters)

References

+7
source share
2 answers

You can try the guess_language library that I found through the Miguel Grinber The Flask Mega Tutorial . It looks like it supports Python 2 and 3, so it should be ok.

+8
source

Perhaps you can use hidden Markov models to detect languages, each language will have its own characteristics.

+1
source

All Articles