What is the fastest way to extract non-zero indexes from a byte array in C ++

I have a byte array

unsigned char* array=new unsigned char[4000000]; ... 

And I would like to get the indices of all nonzero elements of the array.

Of course I can do the following

 for(int i=0;i<size;i++) { if(array[i]!=0) somevector.push_back(i); } 

Is there a faster algorithm?

Update 1 I see that most of the answers are no. I was hoping there were some magic bits operations that I don't know about. Some guys suggested sorting, but in this case it was impossible. But thank you very much for all your answers.

Update 2 . 4 years and 4 months after the question came up, @wim suggested this answer, which looks promising .

+6
source share
5 answers

If an array of bytes, which is basically zero, being a sparse array, you can use a 32-bit processor, making comparisons of 4 bytes at a time. Actual comparisons are performed 4 bytes at a time, however, if any of the bytes is non-zero, then you must determine which of the bytes in the unsigned long is non-zero, so more effort will be required. If the array is indeed sparse, then the time saved by comparisons can compensate for the extra work that determines which of the bytes is non-zero.

The easiest way would be to make an unsigned char array of no more than 4 bytes in size, so you don’t have to worry about making the last few bytes after the loop ends.

I would suggest doing a time study, because it is purely hypothetical, and there will be a point at which the array becomes insufficiently sparse to take longer than a simple loop.

One question I need is what you are doing with the displacement vector of nonzero elements of the array and whether you can end this vector. Another question: do you need a vector, is it possible to build a vector when you put elements in an array.

 unsigned char* array=new unsigned char[4000000]; ...... unsigned long *pUlaw = (unsigned long *)array; for ( ; pUlaw < array + 4000000; pUlaw++) { if (*pUlaw) { // at least one byte is non-zero unsigned char *pUlawByte = (unsigned char *)pUlaw; if (*pUlawByte) somevector.push_back(pUlawByte - array); if (*(pUlawByte+1)) somevector.push_back(pUlawByte - array + 1); if (*(pUlawByte+2)) somevector.push_back(pUlawByte - array + 2); if (*(pUlawByte+3)) somevector.push_back(pUlawByte - array + 3); } } 
+1
source

If your vector is not ordered, this is the most efficient algorithm for doing what you want to do if you are using a program with a mono stream. You can try to optimize the data structure in which you want to store the result, but over time this is the best you can do.

+4
source

The only thing you can do to improve speed is to use concurrency.

+1
source

If non-zero values ​​are relatively rare, one trick you can use is a sentinel value:

 unsigned char old_value = array[size-1]; array[size-1] = 1; // make sure we find a non-zero eventually int i=0; for (;;) { while (array[i]==0) ++i; // tighter loop if (i==size-1) break; somevector.push_back(i); ++i; } array[size-1] = old_value; if (old_value!=0) { somevector.push_back(size-1); } 

This avoids checking both the index and the value at each iteration.

+1
source

This is not really the answer to your question, but I tried to imagine what problem you are trying to solve.

Sometimes when performing operations on matrices (in the mathematical sense), operations can be improved if you know that the vast majority of matrix elements will be zeros (sparse matrix). You do this optimization by not using a large array at all, but simply by storing the {index, value} pairs that indicate a nonzero element.

0
source

Source: https://habr.com/ru/post/926004/


All Articles