How to improve this implementation of radix sorting?

I implement 2-byte Radix sorting. The concept is to use Counting Sort to sort the lower 16 bits of integers, and then for the upper 16 bits. This allows me to start sorting in 2 iterations. The first concept I had was trying to figure out how to handle negatives. Since the sign bit will be flipped for negative numbers, in hexadecimal it will make negatives larger than positive ones. To combat this, I flipped the sign bit when it was positive, to make [0, 2 beat] = [128,000,000,000, 255,255 ...). And when it was negative, I flipped all the bits so that it ranged from (000 000 .., 127 255 ..). This sitehelped me with this information. To finish it, I would split the integer into the upper or lower 16-bit based on the pass. Below is the code that allows me to do this.

static uint32_t position(int number, int pass) {
    int mask;
    if (number <= 0) mask = 0x80000000;
    else mask = (number >> 31) | 0x80000000;
    uint32_t out = number ^ mask;
    return pass == 0 ? out & 0xffff : (out >> 16) & 0xffff;
}

To run the actual Radix sort, I needed to create a histogram of 65,536 elements. The problem I ran into was that the number of elements entered was very large. It will take some time to create a histogram, so I implemented it in parallel using processes and shared memory. I split the array into subsections of size / 8. Then, using a shared memory array of size 65536 * 8, each process created its own histogram. Subsequently, I summarized all this together to form one histogram. Below is the code for this:

for (i=0;i<8;i++) {
    pid_t pid = fork();
    if (pid < 0) _exit(0);
    if (pid == 0) {
        const int start = (i * size) >> 3;
        const int stop  = i == 7 ? size : ((i + 1) * size) >> 3;
        const int curr  = i << 16;
        for (j=start;j<stop;++j)
            hist[curr + position(array[j], pass)]++;
        _exit(0);
    }
}
for (i=0;i<8;i++) wait(NULL);

for (i=1;i<8;i++) {
    const int pos = i << 16;
    for (j=0;j<65536;j++)
        hist[j] += hist[pos + j];
}

, , -. 8- 11- Radix Sort L1. 16- - L2. , 16- , 2 . - CUDA. 250 1,5 , 16- . , :

for (i=1;i<65536;i++)
    hist[i] += hist[i-1];

temp. , , . , - . , temp . . :

histogram(array, size, 0, hist);
for (i=size-1;i>=0;i--)
    temp[--hist[position(array[i], 0)]] = array[i];

memset(hist, 0, arrSize);
histogram(temp, size, 1, hist);
for (i=size-1;i>=0;i--)
    array[--hist[position(temp[i], 1)]] = temp[i];

, . quicksort, 5 10 5 8- . ?

0
1

, . . unsigned, , .

, . ? ? ptrhead .

0

All Articles