I use JAWS for regular wordnet materials because it is easy to use. However, for indicators of similarity, I use the library located here . You will also need to download this folder containing pre-processed WordNet and corpus data for it to work. You can use the code like this if you put this folder in another folder called "lib" in the project folder:
JWS ws = new JWS("./lib", "3.0"); Resnik res = ws.getResnik(); TreeMap<String, Double> scores1 = res.res(word1, word2, partOfSpeech); for(Entry<String, Double> e: scores1.entrySet()) System.out.println(e.getKey() + "\t" + e.getValue()); System.out.println("\nhighest score\t=\t" + res.max(word1, word2, partOfSpeech) + "\n\n\n");
This will print something like the following, showing an assessment of the similarity between each possible combination of synsets, represented by words that need to be compared:
hobby#n
There are also methods that allow you to specify the meaning of both / both words: res(String word1, int senseNum1, String word2, partOfSpeech) , etc. Unfortunately, the source documentation is not a JavaDoc, so you will need to manually check it. The source can be downloaded here .
Available Algorithms:
JWSRandom(ws.getDictionary(), true, 16.0);//random number for baseline Resnik res = ws.getResnik(); LeacockAndChodorowlch = ws.getLeacockAndChodorow(); AdaptedLesk adLesk = ws.getAdaptedLesk(); AdaptedLeskTanimoto alt = ws.getAdaptedLeskTanimoto(); AdaptedLeskTanimotoNoHyponyms altnh = ws.getAdaptedLeskTanimotoNoHyponyms(); HirstAndStOnge hso = ws.getHirstAndStOnge(); JiangAndConrath jcn = ws.getJiangAndConrath(); Lin lin = ws.getLin(); WuAndPalmer wup = ws.getWuAndPalmer();
In addition, this requires the jar file for the MIT JWI
Nate glenn
source share