Track disease progression with Python nltk and SQL

I have a lot of gigabytes of Facebook / Twitter / RSS data.

I use it to track, by general generalization, a generalized population of the evolution of hyperparathyroidism from someone who is diagnosing someone with the drugs they took, treatment methods and end results.

I am new to NLTK and I have excellent Python / SQL experience.

All my data is parathyroid ; however, as you can see below (data from the twitter example), this is linguistically terrible:

 omg i think my parathyroid is screwed up!!! Have been stuck at parathyroid hormone. STOP GETTING ON TWITTER JASMINE. Cryopreservation of Parathyroid Tissue after Parathyroid Surgery for Renal Hyperparathyroidism The Parathyroid as a Target for Radiation Damage it for the parathyroid hormone la 

All this data is stored in a database. We also have fields like poster, zip code, message text, etc.

I was wondering if anyone could point me in the right direction for the following:

  • Are there effective algorithms to help me do what I need?
  • Linguistically, how can we find correlations in data? We are trying to track patterns.
  • Is there some kind of "mesh" form in which I have to put the data to help with the analysis?
+4
source share

All Articles