How to get multithreaded dot () function?

When I execute this in REPL Julia v0.3.7 on my 64-bit Windows 8.1 computer with 8 logical processor cores:

blas_set_num_threads(CPU_CORES)
const v=ones(Float64,100000)
@time for k=1:1000000;s=dot(v,v);end

I observe on the CPU counter of the task manager or Process Explorer that only 12.5% ​​of the CPU is used (1 logical processor core). I also observe the same with Julia v0.3.5, both on Windows 7 and Windows 8.1. I also observe the same behavior, starting with "Julia -p 8" on the command line. Returning to running Julia REPL without the “-p 8” command line option, I tried this test:

blas_set_num_threads(CPU_CORES)
@time peakflops(10000)

In this case, the CPU counter shows the use of 100% CPU.

Because dot()and peakflops()both use BLAS (OpenBLAS in my case), I expect that the number of streams defined blas_set_num_threads(). However, in fact, only the last function is valid. Is the behavior dot()due to an error possibly in OpenBLAS?

I tried to get around Julia’s flaw using the matrix multiplication function. However, I perform operations dot()on the sub-vectors of GB-sized 2D arrays, where the sub-vectors use continuous memory. The matrix multiplies makes me carry every vector that creates a copy. It is expensive inside the inner cycle. So the choice for me seems to be to find out how to use Julia's parallel processing commands / macros or return to Python (where Intel MKL BLAS works as expected for ddot()). Because thedot() - , 99% , , , OpenBLAS Julia, . , , ...

dot(). . SharedArray, ? , SharedArray ? , . 100 000, , dot(), . Julia , BLAS?

: BLAS v. Julia SharedArray ( ) dot() .

1: Julia "-p 8" dot() innersimd() : http://docs.julialang.org/en/release-0.3/manual/performance-tips/ 1 . innersimd(), ::Array{Float64, 1}, ::SharedArray{Float64, 1}, 1 .: (

2: Julia ( BLAS 'gemm!()):

blas_set_num_threads(CPU_CORES)
const A=ones(Float64,(4,100000))
const B=ones(Float64,(100000,4))
@time for k=1:100000;s=A*B;end

"-p", .

3: Python:

import numpy as np
from scipy.linalg.blas import ddot
from timeit import default_timer as timer
v = np.ones(100000)
start = timer()
for k in range(1000000):
    s = ddot(v,v)
exec_time=(timer() - start)
print
print("Execution took", str(round(exec_time, 3)), "seconds")

64- Anaconda3 v2.1.0 WinPython: 7.5 . , Julia 0.3.7 OpenBLAS, 28 . Python 4 , Julia, OpenBLAS ddot().

4: Python (4xN) * (Nx2), (N = 100000), , . , , 8 , . - Julia Python, Julia : 100000 4 ddot() OpenBLAS ddot() ( )? 4. OpenBLAS , Julia "-p 8" .

5: Julia v0.3.7 "-p 8", , OpenBLAS gemm!() ( ):

blas_set_num_threads(CPU_CORES)
const a = rand(10000, 10000)
@time a * a
+4
1

, OpenBLAS 'ddot . , BLAS-3, xgemm, , , .

, , . , . Python? BLAS, , Python.

Julia -p 8 Julia BLAS, , , .

1: OpenBLAS MKL , , , - , . , OpenBLAS . , , . Python ? , Python, , .

2: ddot MKL. , 1 dgemm. Julia, , Julia , OpenBLAS 'ddot.

  • , , OpenBLAS 'dgemm , . , , n=16. , , , MKL. OpenBLAS 'dgemm MKL.

  • OpenBLAS ddot . , , BLAS .

  • MKL Revolution Julia, MKL, .

3: Ubuntu Nahalem 80 ( 16). Julia: OpenBLAS MKL. Julia OpenBLAS Haswell, , ddot .

+5

All Articles