I have a lot of text in mysql tables. I want to do some statistical analysis and then some NLP in my text using the NLTK toolkit. I have two options:
- Extract all the text at once from my database table (maybe put them in a file if necessary) and use the NLTK functions
- Extract the text and turn it into a “body” that can be used with NLTK.
The latter seems rather complicated, and I have not found articles that actually describe how to use it. I just found this: Creating a built-in corpus reader from MongoDB that uses MongoDB as its database, and the code is quite complex, and also requires knowledge of MongoDB. On the other hand, the first seems very simple, but leads to overhead that extracts texts from the database.
Now the question is, what are the advantages of the case in NLTK? In other words, if I take the call and double-check the NTLK methods so that it can read from the MySQL database, would it be worth the hassle? Does my text turn into a case that I cannot (or with great difficulty) do with the usual NLTK functions?
Also, if you know something about connecting MySQL to NLTK, please let me know. Thanks
python database mysql nltk
Hossein
source share