I tried to run the following example on DL4J (loading a file of pre-prepared vectors):
File gModel = new File("./GoogleNews-vectors-negative300.bin.gz"); Word2Vec vec = WordVectorSerializer.loadGoogleModel(gModel, true); InputStreamReader r = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(r); for (; ; ) { System.out.print("Word: "); String word = br.readLine(); if ("EXIT".equals(word)) break; Collection<String> lst = vec.wordsNearest(word, 20); System.out.println(word + " -> " + lst); }
But it is very slow (it takes ~ 10 minutes to calculate the next words, although they are correct).
Enough memory ( -Xms20g -Xmx20g ).
When I run the same Word2Vec example from https://code.google.com/p/word2vec/
he gives the quickest words.
DL4J uses ND4J, which claims to be twice as fast as Numpy: http://nd4j.org/benchmarking
Is there something wrong with my code?
UPDATE: it is based on https://github.com/deeplearning4j/dl4j-0.4-examples.git (I did not touch any dependencies, just tried to read the pre-prepared Google vectors file). Word2VecRawTextExample works just fine (but the data size is relatively small).
source share