Ffmpeg (-mt) and TBB

I just started using the latest ffmpeg assembly into which ffmpeg-mt merge.

However, since my application uses TBB (Intel Threading Building Blocks), ffmpeg-mt memlement with new threading and synchronization is not entirely suitable, as this can potentially block my TBB tasks that perform decoding functions. In addition, it will be unnecessary to flush the cache.

I searched in pthread.c which seems to implement the interface that ffmpeg uses to enable multithreading.

My question is, is it possible to create tbb.c that implements the same functions, but using tbb tasks instead of explicit threads?

I have no experience with C, but I assume that it would not be possible to easily compile tbb (which is C ++) into ffmpeg. So maybe somehow rewrite the pointers to the ffmpeg function at runtime, is there a way?

I would appreciate any suggestions or comments regarding the implementation of TBB in the ffmpeg threading api.

+4
source share
2 answers

So, I figured out how to do this by reading the ffmpeg code.

Basically all you have to do is include the code below and use tbb_avcodec_open/tbb_avcodec_close instead of ffmpegs avcodec_open/avcodec_close .

This will use TBB tasks to execute decoding in parallel.

  // Author Robert Nagy #include "tbb_avcodec.h" #include <tbb/task.h> #include <tbb/atomic.h> extern "C" { #define __STDC_CONSTANT_MACROS #define __STDC_LIMIT_MACROS #include <libavformat/avformat.h> } int task_execute(AVCodecContext* s, std::function<int(void* arg, int arg_size, int jobnr, int threadnr)>&& func, void* arg, int* ret, int count, int size) { tbb::atomic<int> counter; counter = 0; // Execute s->thread_count number of tasks in parallel. tbb::parallel_for(0, s->thread_count, 1, [&](int threadnr) { while(true) { int jobnr = counter++; if(jobnr >= count) break; int r = func(arg, size, jobnr, threadnr); if (ret) ret[jobnr] = r; } }); return 0; } int thread_execute(AVCodecContext* s, int (*func)(AVCodecContext *c2, void *arg2), void* arg, int* ret, int count, int size) { return task_execute(s, [&](void* arg, int arg_size, int jobnr, int threadnr) -> int { return func(s, reinterpret_cast<uint8_t*>(arg) + jobnr*size); }, arg, ret, count, size); } int thread_execute2(AVCodecContext* s, int (*func)(AVCodecContext* c2, void* arg2, int, int), void* arg, int* ret, int count) { return task_execute(s, [&](void* arg, int arg_size, int jobnr, int threadnr) -> int { return func(s, arg, jobnr, threadnr); }, arg, ret, count, 0); } void thread_init(AVCodecContext* s) { static const size_t MAX_THREADS = 16; // See mpegvideo.h static int dummy_opaque; s->active_thread_type = FF_THREAD_SLICE; s->thread_opaque = &dummy_opaque; s->execute = thread_execute; s->execute2 = thread_execute2; s->thread_count = MAX_THREADS; // We are using a task-scheduler, so use as many "threads/tasks" as possible. } void thread_free(AVCodecContext* s) { s->thread_opaque = nullptr; } int tbb_avcodec_open(AVCodecContext* avctx, AVCodec* codec) { avctx->thread_count = 1; if((codec->capabilities & CODEC_CAP_SLICE_THREADS) && (avctx->thread_type & FF_THREAD_SLICE)) thread_init(avctx); // ff_thread_init will not be executed since thread_opaque != nullptr || thread_count == 1. return avcodec_open(avctx, codec); } int tbb_avcodec_close(AVCodecContext* avctx) { thread_free(avctx); // ff_thread_free will not be executed since thread_opaque == nullptr. return avcodec_close(avctx); } 
+7
source

By resubmitting your answer here to your question on the TBB forum , for the sake of who might be interested in SO.

Your code in the answer above looks good to me; A smart way to use TBB in a context that has been designed with its own streams in mind. I wonder if this can do even more TBBish, so to speak. I have some ideas that you can try if you have the time and desire.

The following two elements may be of interest if there is a desire / need to control the number of threads.

  • in thread_init, create the tbb::task_scheduler_init (TSI) object selected by the heap and initialize it as many times as needed (not necessarily MAX_THREADS). Store the address of this object in s->thread_opaque , if possible / allowed; if not, then a possible solution is a global mapping that maps AVCodecContext* to the address of the corresponding task_scheduler_init .
  • respectively, in thread_free, get and delete the TSI object.

Regardless of the above, another potential change is how to call tbb::parallel_for . Instead of using it only to create a sufficient number of threads, is it possible to use it for its immediate purpose, for example below?

 int task_execute(AVCodecContext* s, std::function<int(void*, int, int, int)>&& f, void* arg, int* ret, int count, int size) { tbb::atomic<int> counter; counter = 0; // Execute 'count' number of tasks in parallel. tbb::parallel_for(tbb::blocked_range<int>(0, count, 2), [&](const tbb::blocked_range<int> &r) { int threadnr = counter++; for(int jobnr=r.begin(); jobnr!=r.end(); ++jobnr) { int r = func(arg, size, jobnr, threadnr); if (ret) ret[jobnr] = r; } --counter; }); return 0; } 

This can improve if count significantly larger than thread_count , since a) more parallel slack means that TBB works more efficiently (which you apparently know), and b) the overhead of the centralized atom counter is more iterations. Notice that I chose a grain size of 2 for blocked_range ; this is due to the fact that the counter increases and decreases in the body of the loop, so at least two iterations per task (and accordingly count>=2*thread_count ) are required to "match" your variant.

+2
source

All Articles