How to choose the most powerful OpenCL device?

My computer has both an Intel GPU and an NVIDIA GPU. The latter is much more powerful and is my preferred device when performing heavy tasks. I need a way to programmatically determine which device to use.

I know that it’s hard to understand which device is best for a specific task. I need to (programmatically) make a qualified guess using the variables listed below.

How are you enjoying these two devices? Intel HD Graphics 4400left, GeForce GT 750Mright.

GlobalMemoryCacheLineSize               64 vs 128
GlobalMemoryCacheSize              2097152 vs 32768
GlobalMemorySize                1837105152 vs 4294967296
HostUnifiedMemory                     true vs false
Image2DMaxHeight                     16384 vs 32768
Image2DMaxWidth                      16384 vs 32768
Image3DMaxDepth                       2048 vs 4096
Image3DMaxHeight                      2048 vs 4096
Image3DMaxWidth                       2048 vs 4096
LocalMemorySize                      65536 vs 49152
MaxClockFrequency                      400 vs 1085
MaxComputeUnits                         20 vs 2
MaxConstantArguments                     8 vs 9
MaxMemoryAllocationSize          459276288 vs 1073741824
MaxParameterSize                      1024 vs 4352
MaxReadImageArguments                  128 vs 256
MaxSamplers                             16 vs 32
MaxWorkGroupSize                       512 vs 1024
MaxWorkItemSizes           [512, 512, 512] vs [1024, 1024, 64]
MaxWriteImageArguments                   8 vs 16
MemoryBaseAddressAlignment            1024 vs 4096
OpenCLCVersion                         1.2 vs 1.1
ProfilingTimerResolution                80 vs 1000
VendorId                             32902 vs 4318

Obviously, there are hundreds of other devices to consider. I need a general formula!

+4
4

.

, , , 2 MaxComputeUnits, 80, ( ).

, , ? - ( ) ( ). , , . , :

  • : ?
  • : CPU? ?
  • : - , , ?
  • : - , , , , ?

: , , (, parallelism , , , , ).

, MaxComputeUnits * MaxClockFrequency ( ), , , , , (a + b / 2)^2, , , .

( , , SO) , . , , . . , , .

, , , , , , . :

  • , , , , (, MaxGroupSize, , ). , (, p-).
  • ( , ), (, [0..5] , [5..10] , [10.. * ) ). ( ). , .

, , 1000 .

+2

@Adriano, , ... . ( , ), ( ):

OCL

, OCL ( ). - OCL 1.2... .

( ) : . , ( ) , Host Unified Memory. , , , .

, , , . . , ( ) , , ( ). , , , PCIe.

GPU

- GPU. , . NVIDIA Excel, . , ( , , ), , .
, . , . . , , (, , GPU , ). , : , OCL clGetKernelWorkGroupInfo(), , ​​ , .
, , :

__local address, , 0.

, , , . , , JIT. -D clBuildProgram(), . - :

#define SIZE

    __mykernel(args){
       local myLocalMem[SIZE];
       ....
    }

, :

. , , , , . ( , , , ), , , ?

:

-, ( ) . -, ...

+2

? : , "" , . , : .

+1

. , ( ).

, , GPU, ( ).

This alternative is reasonable because most systems have only one GPU.

0
source

All Articles