Italian library library in java

I am looking for a java library or something that needs to be done due to Italian word strings.

The goal is to compare Italian words. At this point, words like "attacco", "attacchi", "attaccare" etc. are considered different, instead I want a true comparison.

I found something like Lucene, snowball.tartarus.org etc. Is there anything else useful or how can I use them in java?

Thanks for answers.

+8
java nlp stemming snowball
source share
1 answer

Download Snowball for Java here .

It includes a class called org.tartarus.snowball.ext.italianStemmer , which extends SnowballStemmer .

To use SnowballStemmer , please take a look at the following test code for the verb attaccare present tense:

 import org.junit.Test; import org.tartarus.snowball.SnowballStemmer; import org.tartarus.snowball.ext.italianStemmer; public class SnowballItalianStemmerTest { @Test public void testSnowballItalianStemmerAttaccare() { SnowballStemmer stemmer = (SnowballStemmer) new italianStemmer(); String[] tokens = "attacco attacchi attacca attacchiamo attaccate attaccano".split(" "); for (String string : tokens) { stemmer.setCurrent(string); stemmer.stem(); String stemmed = stemmer.getCurrent(); Assert.assertEquals("attacc", stemmed); System.out.println(stemmed); } } } 

Output:

 attacc attacc attacc attacc attacc attacc 

For another use case, see TestApp.java included in the same tgz file.

Lucene, written in Java, uses Snowball to generate, for example, a filter in SnowballFilter .

+8
source share

All Articles