DL4J is very slow in the GoogleNews-vectors file

Question

DL4J is very slow in the GoogleNews-vectors file

I tried to run the following example on DL4J (loading a file of pre-prepared vectors):

File gModel = new File("./GoogleNews-vectors-negative300.bin.gz"); Word2Vec vec = WordVectorSerializer.loadGoogleModel(gModel, true); InputStreamReader r = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(r); for (; ; ) { System.out.print("Word: "); String word = br.readLine(); if ("EXIT".equals(word)) break; Collection<String> lst = vec.wordsNearest(word, 20); System.out.println(word + " -> " + lst); }

But it is very slow (it takes ~ 10 minutes to calculate the next words, although they are correct).

Enough memory ( -Xms20g -Xmx20g ).

When I run the same Word2Vec example from https://code.google.com/p/word2vec/

he gives the quickest words.

DL4J uses ND4J, which claims to be twice as fast as Numpy: http://nd4j.org/benchmarking

Is there something wrong with my code?

UPDATE: it is based on https://github.com/deeplearning4j/dl4j-0.4-examples.git (I did not touch any dependencies, just tried to read the pre-prepared Google vectors file). Word2VecRawTextExample works just fine (but the data size is relatively small).

+6

deep-learning machine-learning word2vec deeplearning4j nd4j

Eugene retunsky Aug 29 '15 at 21:27

source share

1 answer

Yura zaletskyy · Answer 1 · 2017-11-21T12:01:00+0000

To improve performance, I suggest you do the following:

Set the environment variable OMP_NUM_THREADS to the number of your logical cores
Install Intel Math Kernel Library if you are using Intel processors
In your path, add information about where mkl_intel_thread.dll from the Intel Math Kernel library lives

DL4J is very slow in the GoogleNews-vectors file

More articles: