c++ - Intel TBB Parallelization Overhead -


why intel threading building blocks (tbb) parallel_for have such large overhead? according section 3.2.2 automatic chunking in tutorial.pdf around half millisecond. exert tutorial:

caution: typically loop needs take @ least million clock cycles parallel_for improve performance. example, loop takes @ least 500 microseconds on 2 ghz processor might benefit parallel_for.

from have read far tbb uses threadpool (pool of worker threads) pattern internally , prevents such bad overheads spawning worker threads once (which costs hundreds of microseconds).

so taking time? data synchronization using mutexes isn't slow right? besides doesn't tbb make use of lock-free data structures synchronization?

from have read far tbb uses threadpool (pool of worker threads) pattern internally , prevents such bad overheads spawning worker threads once (which costs hundreds of microseconds).

yes, tbb pre-allocates threads. doesn't physically create , join worker threads whenever sees parallel_for. openmp , other parallel libraries pre-allocation.

but, there still overhead wake threads pool, , dispatch logical tasks threads. yes, tbb exploits lock-free data structures minimize overhead, still requires amount of parallel overhead (i.e., serial part). that's why tbb manual advise avoid short loops.

in general, must have sufficient job gain parallel speedup. think 1 millisecond (=1,000 microseconds) small. experience, in order see meaningful speedup, needed increase execution time around 100 milliseconds.

if parallel overhead of tbb parallel_for concern you, might worthy try simple static scheduling. don't have knowledge of tbb's static scheduling implementation. but, can try on openmp's one: omp parallel schedule(static). believe overhead minimal cost in parallel for. however, since it's using static scheduling, benefit dynamic scheduling (especially when work loads not homogeneous) lost.


Comments

Popular posts from this blog

c# - How to set Z index when using WPF DrawingContext? -

razor - Is this a bug in WebMatrix PageData? -

visual c++ - Using relative values in array sorting ( asm ) -