Cv :: mat CV_8U product error and slow product CV_32F

I am trying to create a product between a 2772x128 matrix and a 4000x128 matrix. Both are SIFT descriptor matrices using the following code:

Mat a = Mat(nframes, descrSize, CV_8U, DATAdescr); Mat b = Mat(vocabulary_size, descrSize, CV_8U, vocabulary); Mat ab =a * bt(); 

The problem is that when calculating the product there is an error saying

 err_msg = 0x00cdd5e0 "..\..\..\src\opencv\modules\core\src\matmul.cpp:711: error: (-215) type == B.type() && (type == CV_32FC1 || type == CV_64FC1 || type == CV_32FC2 || type == CV_64FC2)" 

The solution to this was to convert the data type to CV_32FC1

 Mat a = Mat(nframes, descrSize, CV_8U, DATAdescr); Mat b = Mat(vocabulary_size, descrSize, CV_8U, vocabulary); a.convertTo(a, CV_32FC1); b.convertTo(b, CV_32FC1); Mat ab = a * bt(); 

It works well, but it consumes too much time, about 1.2 s. I would like to try the same product, but use integers to find out if I can speed it up. Am I doing something wrong? I see no reason why I cannot make a matrix product between CV_8U matrices.

EDIT: The answers are related to using other libraries or in another way. I was thinking about opening a new topic with tips for solving my problem, but can anyone answer my original requests? Can I not multiply CV_8U or CV32S matrices? Really?

+2
source share
4 answers

In your other post, you said the following code would take 0.9 seconds.

 MatrixXd A = MatrixXd::Random(1000, 1000); MatrixXd B = MatrixXd::Random(1000, 500); MatrixXd X; 

I tried a little test on my machine, i2 i7 on Linux. My complete test code is as follows:

 #include <Eigen/Dense> using namespace Eigen; int main(int argc, char *argv[]) { MatrixXd A = MatrixXd::Random(2772, 128); MatrixXd B = MatrixXd::Random(4000, 128); MatrixXd X = A*B.transpose(); } 

I just use the time command from linux, so startup time includes starting and stopping the executable.

1 / Compilation without optimization (gcc compiler):

 g++ -I/usr/include/eigen3 matcal.cpp -O0 -o matcal time ./matcal real 0m13.177s -> this is the time you should be looking at user 0m13.133s sys 0m0.022s 

13 seconds, it is very slow. By the way, without matrix multiplication, it takes 0.048 s, with large matrices, which in your example is 0.9s. Why?

Using compiler optimizations with Eigen is very important. 2 / Compilation with some optimization:

 g++ -I/usr/include/eigen3 matcal.cpp -O2 -o matcal time ./matcal real 0m0.324s user 0m0.298s sys 0m0.024s 

Now 0.324s, it's better!

3 / Toggle all optimization flags (at least everything that I know, I'm not an expert in this field)

 g++ -I/usr/include/eigen3 matcal.cpp -O3 -march=corei7 -mtune=corei7 -o matcal time ./matcal real 0m0.317s user 0m0.291s sys 0m0.024s 

0.317, close, but several ms received (sequentially for several tests). Therefore, in my opinion, you have a problem using Eigen, either you do not enable compiler optimization, or your compiler does not do this on its own.

I am not an expert at Eigen. I only used it a few times, but I think the documentation is not bad, and you should probably read it to make the most of it.

Regarding performance comparisons with MatLab, the last time I read about Eigen it was not multithreaded, and MatLab probably used multithreaded libraries. For matrix multiplication, you can split your matrix into several pieces and parallelize the multiplication of each fragment using TBB

+2
source

Offered by remi, I used the same matrix multiplication with Eige. There he is:

 const int descrSize = 128; MatrixXi a(nframes, descrSize); MatrixXi b(vocabulary_size, descrSize); MatrixXi ab(nframes, vocabulary_size); unsigned char* dataPtr = DATAdescr; for (int i=0; i<nframes; ++i) { for (int j=0; j<descrSize; ++j) { a(i,j)=(int)*dataPtr++; } } unsigned char* vocPtr = vocabulary; for (int i=0; i<vocabulary_size; ++i) { for (int j=0; j<descrSize; ++j) { b(i,j)=(int)*vocPtr ++; } } ab = a*b.transpose(); a.cwiseProduct(a); b.cwiseProduct(b); MatrixXi aa = a.rowwise().sum(); MatrixXi bb = b.rowwise().sum(); MatrixXi d = (aa.replicate(1,vocabulary_size) + bb.transpose().replicate(nframes,1) - 2*ab).cwiseAbs2(); 

The key line is the line that indicates

 ab = a*b.transpose(); 

the DATAdescr dictionary is unsigned char arrays. DATAdescr is 2782x128 and the vocabulary is 4000x128. I saw in the implementation that I can use Map, but at first I was not able to use it. The initial loops for tuning are 0.001, so this is not a bottleneck. The whole process is about 1.23 s

Same implementation in matlab (0.05 s.):

 aa=sum(a.*a,2); bb=sum(b.*b,2); ab=a*b'; d = sqrt(abs(repmat(aa,[1 size(bb,1)]) + repmat(bb',[size(aa,1) 1]) - 2*ab)); 

Thanks for the help for the help.

+1
source

If you multiply the matrix, you multiply the values ​​of the elements and sum them up - if you have only the range 0-255, it is likely that the product will be greater than 255. Thus, the product of the matrix CV_8U is not very useful.

If you know that your result will fit in a byte, you can do the multiplication yourself with a loop over the elements.

edit: I'm a little surprised that the floating point version is much slower, in general opencv is pretty good at performance - with multi-core and bulk SSE2 instructions. Did you build from the source? Do you have TBB (i.e. mutlithreading) and SSE2 cpu?

0
source

Try compiling OpenCV using EIGEN as the back end. CMakeList has an option for this. I read in your team that you use OpenCV only to speed up matrix multiplication, so you can even try EIGEN directly .

Last solution, use the OpenCV GPU module.

0
source

Source: https://habr.com/ru/post/926292/


All Articles