I tried to implement the HAAR to CUDA wavelet transform for a 1D array.
ALGORITHM
I have 8 indexes in the input array
With this condition, if(x_index>=o_width/2 || y_index>=o_height/2)
I will have 4 threads, which should be 0,2,4,6, and I plan to indicate the input indices with each of them.
I compute avg.EG: if my stream identifier is "0", then avg (input [0] + input [1]) / 2, and at the same time I get diff, which will be entered [0] -avg and t .d. For other threads.
Setting the output is NOW important. I created a separate thread_id for the output, since using indexes 0,2,4,6 created difficulties with placing the output in the correct index.
My averages should be placed in the first 4 indexes, i.e. 0,1,2,3 of the output, and o_thread_id should be 0,1,2,3. Similarly, to divide by 4,5,6,7, I increased 0,1,2,3 using "4" as shown in the code
PROBLEM
My way out is all zero !!! No matter what I change, I get it.
CODE
__global__ void cal_haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height) { int x_index=blockIdx.x*blockDim.x+threadIdx.x; int y_index=blockIdx.y*blockDim.y+threadIdx.y; if(x_index>=o_width/2 || y_index>=o_height/2) return; int i_thread_id=y_index*i_widthstep+(2*x_index); int o_thread_id=y_index*o_widthstep+x_index; float avg=(input[i_thread_id]+input[i_thread_id+1])/2; float diff=input[i_thread_id]-avg; output[o_thread_id]=avg; output[o_thread_id+4]=diff; } void haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height) { int * d_input; float * d_output; cudaMalloc(&d_input,i_widthstep*o_height); cudaMalloc(&d_output,o_widthstep*o_height); cudaMemcpy(d_input,input,i_widthstep*o_height,cudaMemcpyHostToDevice); dim3 blocksize(16,16); dim3 gridsize; gridsize.x=(o_width+blocksize.x-1)/blocksize.x; gridsize.y=(o_height+blocksize.y-1)/blocksize.y; cal_haar<<<gridsize,blocksize>>>(d_input,d_output,i_widthstep,o_widthstep,o_width,o_height); cudaMemcpy(output,d_output,o_widthstep*o_height,cudaMemcpyDeviceToHost); cudaFree(d_input); cudaFree(d_output); }
The following is my main function: -
void main() { int in_arr[8]={1,2,3,4,5,6,7,8}; float out_arr[8]; int i_widthstep=8*sizeof(int); int o_widthstep=8*sizeof(float); haar(in_arr,out_arr,i_widthstep,o_widthstep,8,1); for(int c=0;c<=7;c++) {cout<<out_arr[c]<<endl;} cvWaitKey(); }
Can you tell me where I am wrong that it gives me zeros as an output? Thanks.