Efficient memory sorting by matrix column

Question

Efficient memory sorting by matrix column

I have a matrix of large numbers, and I would like to apply sortperm to each column of this matrix. The naive thing to do is

 order = sortperm(X[:,j])

which makes a copy. This seems like a shame, so I decided to try SubArray :

 order = sortperm(sub(X,1:n,j))

but it was even slower. For a laugh I tried

 order = sortperm(1:n,by=i->X[i,j])

but of course it was terrible. What is the fastest way to do this?

Here are some benchmarks:

 getperm1(X,n,j) = sortperm(X[:,j]) getperm2(X,n,j) = sortperm(sub(X,1:n,j)) getperm3(X,n) = mapslices(sortperm, X, 1) n = 1000000 X = rand(n, 10) for f in [getperm1, getperm2] println(f) for it in 1:5 gc() @time f(X,n,5) end end for f in [getperm3] println(f) for it in 1:5 gc() @time getperm3(X,n) end end

results:

 getperm1 elapsed time: 0.258576164 seconds (23247944 bytes allocated) elapsed time: 0.141448346 seconds (16000208 bytes allocated) elapsed time: 0.137306078 seconds (16000208 bytes allocated) elapsed time: 0.137385171 seconds (16000208 bytes allocated) elapsed time: 0.139137529 seconds (16000208 bytes allocated) getperm2 elapsed time: 0.433251141 seconds (11832620 bytes allocated) elapsed time: 0.33970986 seconds (8000624 bytes allocated) elapsed time: 0.339840795 seconds (8000624 bytes allocated) elapsed time: 0.342436716 seconds (8000624 bytes allocated) elapsed time: 0.342867431 seconds (8000624 bytes allocated) getperm3 elapsed time: 1.766020534 seconds (257397404 bytes allocated, 1.55% gc time) elapsed time: 1.43763525 seconds (240007488 bytes allocated, 1.85% gc time) elapsed time: 1.41373546 seconds (240007488 bytes allocated, 1.82% gc time) elapsed time: 1.42215519 seconds (240007488 bytes allocated, 1.83% gc time) elapsed time: 1.419174037 seconds (240007488 bytes allocated, 1.83% gc time)

Where the mapslices version is the 10x version of getperm1 , as you would expect.

It is worth noting that on my machine, at least the copy + sortperm option is not much slower than just sortperm for a vector of the same length, but there is no need for memory allocation, so it would be nice to avoid this.

+5

julia-lang

Iaindunning Mar 14 '15 at 3:05

source share

1 answer

Matt B. · Answer 1 · 2015-03-16T23:33:15+0000

You can beat the performance of SubArray in several very specific cases (for example, with continuous viewing of Array ) with pointer magic:

 function colview(X::Matrix,j::Int) n = size(X,1) offset = 1+n*(j-1) # The linear start position checkbounds(X, offset+n-1) pointer_to_array(pointer(X, offset), (n,)) end getperm4(X,n,j) = sortperm(colview(X,j))

The colview function will return a full-featured Array that shares its data with the original X Note that this is a terrible idea , because the returned array refers to data that Julia only tracks through X This means that if X goes out of scope before the data access in the browse column fails with segfault.

With the results:

 getperm1 elapsed time: 0.317923176 seconds (15 MB allocated) elapsed time: 0.252215996 seconds (15 MB allocated) elapsed time: 0.215124686 seconds (15 MB allocated) elapsed time: 0.210062109 seconds (15 MB allocated) elapsed time: 0.213339974 seconds (15 MB allocated) getperm2 elapsed time: 0.509172302 seconds (7 MB allocated) elapsed time: 0.509961218 seconds (7 MB allocated) elapsed time: 0.506399583 seconds (7 MB allocated) elapsed time: 0.512562736 seconds (7 MB allocated) elapsed time: 0.506199265 seconds (7 MB allocated) getperm4 elapsed time: 0.225968056 seconds (7 MB allocated) elapsed time: 0.220587707 seconds (7 MB allocated) elapsed time: 0.219854355 seconds (7 MB allocated) elapsed time: 0.226289377 seconds (7 MB allocated) elapsed time: 0.220391515 seconds (7 MB allocated)

I have not considered why performance is worse than SubArray, but it could just be because of the additional dereferencing of a pointer to each memory access. It is very noteworthy how little distribution really costs you in terms of time - timeperm1 timers are more variable, but it still sometimes exceeds getperm4! I think this is due to some additional pointer math in Array internal implementation with shared data. There are also some crazy caching behaviors ... getperm1 gets significantly faster on repeated starts.

Efficient memory sorting by matrix column

More articles: