HAAR to CUDA Wavelet Transform

I tried to implement the HAAR to CUDA wavelet transform for a 1D array.

ALGORITHM

I have 8 indexes in the input array

With this condition, if(x_index>=o_width/2 || y_index>=o_height/2) I will have 4 threads, which should be 0,2,4,6, and I plan to indicate the input indices with each of them.

I compute avg.EG: if my stream identifier is "0", then avg (input [0] + input [1]) / 2, and at the same time I get diff, which will be entered [0] -avg and t .d. For other threads.

Setting the output is NOW important. I created a separate thread_id for the output, since using indexes 0,2,4,6 created difficulties with placing the output in the correct index.

My averages should be placed in the first 4 indexes, i.e. 0,1,2,3 of the output, and o_thread_id should be 0,1,2,3. Similarly, to divide by 4,5,6,7, I increased 0,1,2,3 using "4" as shown in the code

PROBLEM

My way out is all zero !!! No matter what I change, I get it.

CODE

 __global__ void cal_haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height) { int x_index=blockIdx.x*blockDim.x+threadIdx.x; int y_index=blockIdx.y*blockDim.y+threadIdx.y; if(x_index>=o_width/2 || y_index>=o_height/2) return; int i_thread_id=y_index*i_widthstep+(2*x_index); int o_thread_id=y_index*o_widthstep+x_index; float avg=(input[i_thread_id]+input[i_thread_id+1])/2; float diff=input[i_thread_id]-avg; output[o_thread_id]=avg; output[o_thread_id+4]=diff; } void haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height) { int * d_input; float * d_output; cudaMalloc(&d_input,i_widthstep*o_height); cudaMalloc(&d_output,o_widthstep*o_height); cudaMemcpy(d_input,input,i_widthstep*o_height,cudaMemcpyHostToDevice); dim3 blocksize(16,16); dim3 gridsize; gridsize.x=(o_width+blocksize.x-1)/blocksize.x; gridsize.y=(o_height+blocksize.y-1)/blocksize.y; cal_haar<<<gridsize,blocksize>>>(d_input,d_output,i_widthstep,o_widthstep,o_width,o_height); cudaMemcpy(output,d_output,o_widthstep*o_height,cudaMemcpyDeviceToHost); cudaFree(d_input); cudaFree(d_output); } 

The following is my main function: -

 void main() { int in_arr[8]={1,2,3,4,5,6,7,8}; float out_arr[8]; int i_widthstep=8*sizeof(int); int o_widthstep=8*sizeof(float); haar(in_arr,out_arr,i_widthstep,o_widthstep,8,1); for(int c=0;c<=7;c++) {cout<<out_arr[c]<<endl;} cvWaitKey(); } 

Can you tell me where I am wrong that it gives me zeros as an output? Thanks.

+4
source share
1 answer

The problem with your code is the following condition:

 if(x_index>=o_width/2 || y_index>=o_height/2) return; 

Given o_height = 1 , we have o_height/2 = 0 ( o_height is int , so we have integer division here with rounding), so the threads do not perform any operations. To achieve what you want, you can either do floating point arithmetic, or use (o_height+1)/2 and (o_width+1)/2 : it will do the arithmetic rounding division (you will have ( x_index >= (8+1)/2 /*= 4*/ && y_index >= (1+1)/2 /*= 1*/ ) )

In addition, there is a problem with addressing when you have more than 1 thread in the Y-dimension, since then i_thread_id and o_thread_id calculations would be incorrect ( _withstep is the size in bytes, but you use it as an array).

+5
source

Source: https://habr.com/ru/post/1414005/


All Articles