Natural Language Processing in Ruby

I want to do an analysis of sentences (mainly for Twitter applications) and derive some general characteristics. Are there any good natural language processing libraries for this kind of thing in Ruby?

Like and Is there a good natural language processing library , but for Ruby. I would prefer something very general, but any conclusions are appreciated!

+63
ruby artificial-intelligence nlp
Jun 16 '09 at 2:53
source share
11 answers

There are some things in Ruby Linguistics and some links from there, although this does not seem like anything close to what NLTK for Python does.

+23
Jun 16 '09 at 3:02
source share

Three excellent and mature NLP packages: Stanford Nuclear NLP , Open NLP and LingPipe . There are Ruby bindings to the Stanford Core NLP tools (GPL license), as well as OpenNLP tools (Apache license).

In the more experimental part, I support the GPL Text Extraction, Extraction and Annotation (Heal) Tool , which provides a common API for almost every NLP-related stone that exists for Ruby. The following list of Treat functions can also serve as a good reference in terms of stable natural processing languages ​​compatible with Ruby 1.9.

  • Text segments and punkt-segmenter ( punkt-segmenter , tactful_tokenizer , srx-english , scalpel )
  • Natural language analyzers for learning English, French and German and a name for English ( stanford-core-nlp ).
  • Word flexing and conjugation ( linguistics ), output ( ruby-stemmer , uea-stemmer , lingua , etc.)
  • WordNet interface ( rwordnet ), POS rbtagger ( rbtagger , engtagger , etc.)
  • Language ( whatlanguage ), date / time ( chronic , kronic , nickel ), keyword extraction ( lda-ruby ).
  • Text extraction with indexing and full-text search ( ferret ).
  • Retrieving the searched object ( stanford-core-nlp ).
  • Basic machine learning with decision trees ( decisiontree ), MLP ( ruby-fann ), SVM ( rb-libsvm ) and linear classification ( tomz-liblinear-ruby-swig ).
  • Text similarity levenshtein-ffi ( levenshtein-ffi , fuzzy-string-match , tf-idf-similarity ).

Does not apply to treatment, but relates to NLP: hotwater (string remote algorithms), yomu (bindings to Apache Tiki for reading .doc, .docx, .pages, .odt, .rtf, .pdf), graph-rank (implementation of GraphRank )

+60
Apr 7 2018-12-12T00:
source share

You can always use jruby and use java libraries.

EDIT: The ability to make rubies initially on jvm and easy to use java libraries is a big plus for rubists. This is a good option to consider in such a situation.

What NLP toolkit to use in JAVA?

+11
Jun 16 '09 at 3:49
source share

I found a wonderful article that details some of the Ruby NLP algorithms here . This includes stem cells, time parsers and grammar parsers.

+9
Jun 18 '09 at 13:44
source share

TREAT - Text and Annotation Editing Tool - This is the most comprehensive set of tools I know for Ruby: https://github.com/louismullie/treat/wiki/

+6
Mar 19
source share

Also consider using SaaS APIs such as MonkeyLearn . You can easily train text classifiers using machine learning and integrate through the API. The Ruby SDK is available there.

In addition to creating your own classifiers, you can select pre-created modules for analyzing moods, classifying topics, defining a language, etc. We also have extractors, such as extracting keywords and entities, and we will add additional public modules.

Other nice features:

  • You have a graphical interface for creating / testing algorithms.
  • Algorithms work very fast on our cloud computing platform.
  • You can integrate with Ruby or any other programming language.
+5
Feb 18 '15 at 15:35
source share

Try this one

https://github.com/louismullie/stanford-core-nlp

About stanford-core-nlp stone

This stone provides Ruby's high-level bindings to the Stanford Core NLP package, a set of natural language processing tools for tokenization, segmentation of sentences, partial speech tags, lemmatization and parsing of English, French and German. The package also provides named object recognition and coherence resolution for the English language.

http://nlp.stanford.edu/software/corenlp.shtml demo page http://nlp.stanford.edu:8080/corenlp/

+4
Jan 07 '13 at 19:47
source share

I maintain a list of Natural Ruby Processing Resources (libraries, APIs and presentations) on GitHub that covers the libraries listed in the other answers here, as well as some additional libraries.

+4
Mar 28 '15 at 0:40
source share

You need to be more specific about these “general characteristics."

In NLP, “general characteristics” of a sentence can mean a million different things - an analysis of moods (that is, the speaker’s attitude), the bulk of speech tags, the use of a personal pronoun, does the sentence contain active or passive verbs, what tension and voice of the verbs ...

I do not mind if you vaguely describe it, but if we do not know what you are asking for, this is unlikely, we can be specific in helping you.

My general suggestion, especially for NLP, is that you should get the tool that works best for you, and not limit yourself to a specific language. Limiting yourself to a specific language is great for some tasks where common tools are implemented everywhere, but NLP is not one of them.

Another problem with Twitter is that there are many offers that will be half-baked or compressed in strange and wonderful ways - which is why most NLP tools are not trained. To help out there, NUS SMS Corpus consists of "about 10,000 SMS messages collected by students." Due to such limitations and uses, an analysis that may be useful in your research using Twitter.

If you are more specific, I will try to list some tools that will help.

+2
Jun 16 '09 at 3:35
source share

I would check out the free book by Mark Watson Practical Semantic Web Interface and Related Data Applications, Java, Scala, Clojure, and JRuby Edition . He has chapters on NLP using java, Clojure, ruby, and scala. It also provides links to the resources you need.

+1
03 Feb 2018-12-12T00:
source share

For people who are looking for something lighter and easier to implement this option worked well for me.

https://github.com/yohasebe/engtagger

+1
Feb 21 '15 at 19:50
source share



All Articles