To complete the task, you will need a labeled set of workouts. Then you train the classification model for this set of trainings and predict the location of new text fragments based on the model. You can see how they all work together in this code example written on top of SCIKIT-LEARN: http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
designated training set:
You can train the classifier over the training complex, where each sample is in training (paragraph, region_id). region_id can be an identifier of a country, region or city.
Classification model training:
You create a package of words (for example, unigrams) of the model of each sample and train a classifier (for example, logistic regression with regulation L1) over the marked set of trainings. You can use any tool, but I recommend using SCIKIT-LEARN in Python, which is very simple and efficient to use.
Forecast:
After training, taking into account a paragraph or a fragment of text, the trained model can find the region_id for it, which is based on the words used in the sample.
Remember to adjust the regularization parameter over the development kit to get a good result (to prevent overriding the sample).
Read my article and this geolocation using text: http://www.aclweb.org/anthology/N15-1153
and related poster: http://www.slideshare.net/AfshinRahimi2/geolocation-twittertextnetwork-48968497
I also wrote a tool called Pigeo , which does just that and comes with a pre-prepared model. In addition to these works, there are many other textual geolocation research papers you can find.
Ash
source share