Benefits of creating your own building at NLTK

I have a lot of text in mysql tables. I want to do some statistical analysis and then some NLP in my text using the NLTK toolkit. I have two options:

  • Extract all the text at once from my database table (maybe put them in a file if necessary) and use the NLTK functions
  • Extract the text and turn it into a “body” that can be used with NLTK.

The latter seems rather complicated, and I have not found articles that actually describe how to use it. I just found this: Creating a built-in corpus reader from MongoDB that uses MongoDB as its database, and the code is quite complex, and also requires knowledge of MongoDB. On the other hand, the first seems very simple, but leads to overhead that extracts texts from the database.

Now the question is, what are the advantages of the case in NLTK? In other words, if I take the call and double-check the NTLK methods so that it can read from the MySQL database, would it be worth the hassle? Does my text turn into a case that I cannot (or with great difficulty) do with the usual NLTK functions?

Also, if you know something about connecting MySQL to NLTK, please let me know. Thanks

+6
python database mysql nltk
source share
1 answer

Having read a lot well, I found out the answer. There are several very useful functions, such as collocations, search, common_context, which can be used in texts that are saved as corpus in NLTK. their implementation takes quite a lot of time. If I select my text from the database and put it into a file and use the nltk.Text function, then I can use all the functions that I mentioned earlier without having to write so many lines of code or even rewriting methods so that I can connect to MySql. Here is a link for more information: nltk.Text

+3
source share

All Articles