Julia is much slower than Java

I'm new to Julia, and I wrote a simple function that calculates RMSE (standard error). ratings is a rating matrix, each line [user, film, rating] . There are 15 million ratings. The rmse() method takes 12.0 s, but the Java implementation is about 188 times faster: 0.064 s. Why is Julia's implementation slowing down? In Java, I work with an array of Rating objects, if it was a multi-dimensional int array, it will be even faster.

 ratings = readdlm("ratings.dat", Int32) function predict(user, film) return 3.462 end function rmse() total = 0.0 for i in 1:size(ratings, 1) r = ratings[i,:] diff = predict(r[1], r[2]) - r[3] total += diff * diff end return sqrt(total / size(ratings)[1]) end 

EDIT: After the global variable is excluded, it ends at 1.99 s (31 times slower than in Java). After removing r = ratings[i,:] it is equal to 0.856 s (13x slower).

+8
performance julia-lang
source share
3 answers

A few suggestions:

  • Do not use global variables. For annoying technical reasons, they are slow. Instead, pass ratings as an argument.
  • The line r = ratings[i,:] makes a copy that is slow. Instead, use predict(r[i,1], r[i,2]) - r[i,3] .
  • square() may be faster than x*x - try it.
  • If you are using the original version of Julia from the source, check out the new NumericExtensions.jl package , which has insanely optimized features for many common numerical operations. ( see julia-dev list )
  • Julia must compile the code on first run. The right way to test in Julia is to make the timing several times and ignore the first time.
+9
source share

For me, the following code works in 0.024 seconds (and I doubt that my laptop is much faster than your computer). I initialized the ratings with a commented line, since I did not have the file that you referenced.

 function predict(user, film) return 3.462 end function rmse(r) total = 0.0 for i = 1:size(r,1) diff = predict(r[i,1],r[i,2]) - r[i,3] total += diff * diff end return sqrt(total / size(r,1)) end # ratings = rand(1:20, 5000000, 3) 
+7
source share

On my system, the problem is that your constant predict function is not optimized. Replacing extra calls with predict does the code in 0.01 seconds.

 function time() ratings = ones(15_000_000, 3) predict(user, film) = 3.462 function rmse(ratings) total = 0.0 for i in 1:size(ratings, 1) diff = predict(ratings[i, 1], ratings[i, 2]) - ratings[3] total += diff * diff end return sqrt(total / size(ratings, 1)) end rmse(ratings) @elapsed rmse(ratings) end time() function time2() ratings = ones(15_000_000, 3) predict(user, film) = 3.462 function rmse(ratings) total = 0.0 for i in 1:size(ratings, 1) diff = 3.462 - ratings[3] total += diff * diff end return sqrt(total / size(ratings, 1)) end rmse(ratings) @elapsed rmse(ratings) end time2() 
+5
source share

All Articles