How to use cv :: parallel_for_ to reduce runtime

I have created an image processing algorithm using OpenCV, and I'm currently trying to improve the time efficiency of my own simple function, similar to LUT, but with interpolation between the values ​​( double calibRI::corr(double)). I optimized the pixel loop according to OpenCV Docs .

A non-parallel function ( calib(cv::Mat)-an object of a functor class calibRI) takes about 0.15 s. I decided to use cv::parallel_for_to make it shorter. At first, I implemented it as a graphical rotation on → this . Time was reduced to 0.12 s (4 threads).

    virtual void operator()(const cv::Range& range) const
    {
        for(int i = range.start; i < range.end; i++)
        {
            // divide image in 'thr' number of parts and process simultaneously
            cv::Rect roi(0, (img.rows/thr)*i, img.cols, img.rows/thr);
            cv::Mat in = img(roi);
            cv::Mat out = retVal(roi);
            out = calib(in); //loops over all pixels and does out[u,v]=calibRI::corr(in[u,v])
        }

I, though, that the parallel operation of my function for subimages / tiles / ROI is not yet optimal, so I implemented it as shown below:

template <typename T>
class ParallelPixelLoop : public cv::ParallelLoopBody
{
    typedef boost::function<T(T)> pixelProcessingFuntionPtr;
private:
    cv::Mat& image; //source and result image (to be overwritten)
    bool cont; //if the image is continuous
    size_t rows;
    size_t cols;
    size_t threads;
    std::vector<cv::Range> ranges;
    pixelProcessingFuntionPtr pixelProcessingFunction; //pixel modif. function
public:
    ParallelPixelLoop(cv::Mat& img, pixelProcessingFuntionPtr fun, size_t thr = 4)
        : image(img), cont(image.isContinuous()), rows(img.rows), cols(img.cols), pixelProcessingFunction(fun), threads(thr)
    {
        int groupSize = 1;
        if (cont) {
            cols *= rows;
            rows = 1;
            groupSize = ceil( cols / threads );
        }
        else {
            groupSize = ceil( rows / threads );
        }

        int t = 0;
        for(t=0; t<threads-1; ++t) {
            ranges.push_back( cv::Range( t*groupSize, (t+1)*groupSize ) );
        }
        ranges.push_back( cv::Range( t*groupSize, rows<=1?cols:rows ) ); //last range must be to the end of image (ceil used before)
    }

    virtual void operator()(const cv::Range& range) const
    {
        for(int r = range.start; r < range.end; r++)
        {
            T* Ip = nullptr;
            cv::Range ran = ranges.at(r);
            if(cont) {
                Ip = image.ptr<T>(0);
                for (int j = ran.start; j < ran.end; ++j)
                {
                    Ip[j] = pixelProcessingFunction(Ip[j]);
                }
            }
            else {
                for(int i = ran.start; i < ran.end; ++i)
                {
                    Ip = image.ptr<T>(i);
                    for (int j = 0; j < cols; ++j)
                    {
                        Ip[j] = pixelProcessingFunction(Ip[j]);
                    }
                }
            }
        }
    }
};

Then I run it on the image 1280x1024 64FC1 on the i5, Win8 processor and get the time in the range of 0.4 s using the following code:

double t = cv::getTickCount();
ParallelPixelLoop<double> loop(V,boost::bind(&calibRI::corr,this,_1),4);
cv::parallel_for_(cv::Range(0,4),loop);
std::cout << "Exec time: " << (cv::getTickCount()-t)/cv::getTickFrequency() << "s\n";

I have no idea why my implementation is much slower than repeating all the pixels in the subimages ... Is there an error in my code, or are OpenCV ROI optimized in some special way? I do not think that there is a time measurement error, as described here . I use OpenCV time functions.

Is there any other way to reduce the time of this function?

Thanks in advance!

+4
source share
1 answer

, , cv:: parallel_for . , /, . 2 , - , , x ms, , ,... (, ) x/2 ( x/3) . , - (-) - , "", , , (, , ). - , .

, :

  • - 100 1000 ( 0.12-0.4s, ) , - . :

    double t = cv:: getTickCount();
    for (unsigned int = 0; < 1000; ++) {
    ParallelPixelLoop (V, boost:: bind (& calibRI:: corr, this, _1), 4);
    :: parallel_for_ (:: Range (0,4), );
    }
    std:: cout < "Exec time:" < < (cv:: getTickCount() - t)/cv:: getTickFrequency() < < "S\";

  • . , " " 4 , 4 .

  • ( Very Sleepy), ,

+1

All Articles