Tools available for semantic text analysis

I am looking for a code or product or service for semantic analysis of a text (sentences and paragraphs) in order to classify a text according to a general topic, for example.

  • Finance
  • Entertainment
  • Technology
  • Business
  • Art
  • etc...
+4
source share
4 answers

If you have a bunch of examples that have already been classified, you can use them to train the classifier. This is a very simple document classification problem, and algorithms and tutorials will be provided for any set of machine learning tools. For example, check weka: http://www.cs.waikato.ac.nz/ml/weka/

or quickminer: http://rapid-i.com/content/blogcategory/38/69/

If your needs are limited and you just want a simple API, you will not go wrong in this Naive Bayes library: https://ci-bayes.dev.java.net/

Good luck

+6
source

If you want to evaluate the API of commercial services, check out the VIKI engine APIs: http://www.softwareevolution.it/en/products/viki-core-api.html

This is an easy-to-use Json api service with certain semantic functions.

+1
source

Would that help you?

http://en.wikipedia.org/wiki/Document_classification

This is not a finished product or service, nor code, but it describes various algorithms that can be used for semantic analysis. Googling a little further, I believe that this is not from the laboratory. People experiment with KNN algorithms mainly, resulting in cool stuff, but not quite what you need:

http://www.ebi.ac.uk/webservices/whatizit/info.jsf

But if there is any software that will do what you ask for, it will be on this list:

http://www.kdnuggets.com/software/text.html

For example, the LPU program seems to be able to find out if you need to complete the training documents.

http://www.cs.uic.edu/~liub/LPU/LPU-download.html

0
source

If you are in Python / interpreted languages, check out the excellent NLTK database at nltk.org. He is well versed in the page and recently published book of O'Reilly.

If you use Java and / or require a more mature, but more difficult to understand framework, instead of GATE .

0
source

All Articles