Using the Stanford NLP libraries from R using the rJava package

Does anyone have any experience using StanfordCoreNLP ( http://nlp.stanford.edu/software/corenlp.shtml via rJava in R? Ive struggled to get it working for two days now and I think Ive run out of Google and previous questions in StackOverflow.

Essentially, I'm trying to use the StanfordNLP libraries from R. I have zero Java experience, but experience with other languages, so I understand the basics about classes and objects, etc.

From what I see, the .java demo file that comes with the libraries seems to show that to use the classes from Java, you import the libraries and then create a new object according to:

import java.io.*; import java.util.*; import edu.stanford.nlp.io.*; import edu.stanford.nlp.ling.*; import edu.stanford.nlp.pipeline.*; import edu.stanford.nlp.trees.*; import edu.stanford.nlp.util.*; public class demo { etc. etc. StanfordCoreNLP pipeline = new StanfordCoreNLP(); etc. 

From R, Ive tried to name some standard java functions; it works great, which makes me think that I'm trying to access the Stanford libraries that are causing the problem.

I extracted Stanford ZIP in h: \ stanfordcore , so the .jar files are in the root of this directory. Among other files contained in zip, it contains the main .jar files:

  • Joda-time.jar
  • Stanford-corenlp-1.3.4.jar
  • Stanford-corenlp-1.3.4-javadoc.jar
  • Stanford-corenlp-1.3.4-models.jar
  • Joda-time 2,1-sources.jar
  • jollyday-0.4.7-sources.jar
  • Stanford-corenlp-1.3.4-sources.jar
  • xom.jar
  • jollyday.jar

If I try to access NLP tools from the command line, it works fine.

Inside R, I initialized the JVM and set the classpath variable:

 .jinit(classpath = " h:/stanfordcore", parameters = getOption("java.parameters"),silent = FALSE, force.init = TRUE) 

After that, if I use the command

 .jclassPath() 

This shows that a directory containing the necessary .jar files has been added and gives this result in R:

[1] "H: \ RProject-2.15.1 \ library \ rJava \ java" "h: \ stanfordcore"

However, when I try to create a new object (not sure if this is the correct Java terminology), I get an error.

Ive tried to create an object in dozens of different ways (mainly shoot in the dark), but the most promising one (just because it really finds a class):

 pipeline <- .jnew(class="edu/stanford/nlp/pipeline/StanfordCoreNLP",check=TRUE,silent=FALSE) 

I know that this finds the class, because if I change the class parameter to something that is not specified in the API, I get I can not find the class error.

Be that as it may, I get the error:

Error in .jnew (class = "edu / stanford / nlp / pipeline / StanfordCoreNLP", check = TRUE ,: java.lang.NoClassDefFoundError: Failed to initialize class edu.stanford.nlp.pipeline.StanfordCoreNLP

My Googling indicates that this may be due to the fact that you did not find the required .jar file, but Im completely stuck. Am I missing something?

If anyone can point me at least a little in the right direction, Id would be incredibly grateful.

Thanks in advance!

Peter

+6
source share
2 answers

Your class path is wrong - you are using a directory, but you have JAR files. You must either unzip all the JAR files in the directory you specify (unusual), or add all the JAR files to the class path (more general). [And you will have to correct your typos, obviously, but I assume that they arise because you did not use copy / paste]

PS: please use the stats-rosuda-devel newsletter if you want more timely answers.

+1
source

Success!

After hours of messing around, I managed to find a job. If anyone is interested, this is what I did:

  • Using Eclipse, I started a new project.

  • Then I created a directory called lib under the project root and copied all the Stanford.jar files into this directory.

  • After that I edited the project properties in Eclipse, switched to the “Java Build Path”, clicked the “Libraries” tab.

  • Then I want to import the Java system libraries.

  • I also clicked "Add External Banks" and selected all Stanford banks from the lib directory.

  • Then I created intetermediary Java classes to call Stanford classes (instead of trying to call them directly from R).

Example:

 import java.lang.Object; import java.util.Properties; import java.io.*; import java.util.*; import edu.stanford.nlp.io.*; import edu.stanford.nlp.ling.*; import edu.stanford.nlp.pipeline.*; import edu.stanford.nlp.trees.*; import edu.stanford.nlp.util.*; public class NLP { public static void main(String[] args) { Properties props = new Properties(); props.put("annotators", "tokenize"); StanfordCoreNLP coreNLP = new StanfordCoreNLP(props); } } 

This does not return anything, but shows how you can create a Stanford object.

  • Create a project using Eclipse.

  • Inside R, set the working directory to the Java project / bin directory (this is not necessary since you can add the class directory instead, but it makes it easier).

Then the object can be created in R with:

 .jinit(classpath = ".") // This initilizes the JVM obj = .jnew("NLP") 

After that, any methods that you created as part of the java intermediary classes can be called using:

 Name_of_var_to_store_return_value = . jcall(class name, signature type, method, paramters) 

I still do not understand why I can not name the Stanford classes directly from R, but this method works. I suspect that @ChristopherManning is right, and my problem boils down to calling an external jar from R. Having created it from scratch, Stanford jars are linked at build time, so I assume this is fixed.

0
source

All Articles