I'm trying to use OpenMP to offload on an AMD GPU, I read in the OpenMP 4.5 specification that the target device represents a device on which code and data can be uploaded, but I canβt tell if the offload was successful or if it really was offloaded to my graphic AMD processor.
To check whether the unloading really works, I tried to calculate the time with and without pragmas and check the difference using the wall time, but the time returned in both cases is 0:
This is the simple code used for the test, I will try to use it in my project:
int n = 10240; float a = 2.0f; float b = 3.0f;
float *x = (float*) malloc(n * sizeof(float));
float *y = (float*) malloc(n * sizeof(float));
double start = omp_get_wtime();
#pragma omp target data map(to:x)
{
#pragma omp target map(tofrom:y)
#pragma omp teams
#pragma omp distribute parallel for
for (int i = 0; i < n; ++i){
y[i] = a*x[i] + y[i];
}
#pragma omp target map(tofrom:y)
for (int i = 0; i < n; ++i){
y[i] = b*x[i] + y[i];
}
}
std::cout << "Time: " << (omp_get_wtime() - start) * 1000.0 << " ms" <<std::endl;
free(x); free(y); return 0;
}
NB: I am using gcc 5.1.0 on Windows
Any help would be greatly appreciated.