algorithm - A Cache Efficient Matrix Transpose Program? -


so obvious way transpose matrix use :

  for( int = 0; < n; i++ )      for( int j = 0; j < n; j++ )        destination[j+i*n] = source[i+j*n]; 

but want take advantage of locality , cache blocking. looking , can't find code this, i'm told should simple modification original. ideas?

edit: have 2000x2000 matrix, , want know how can change code using 2 for loops, splitting matrix blocks transpose individually, 2x2 blocks, or 40x40 blocks, , see block size efficient.

edit2: matrices stored in column major order, matrix

a1 a2     a3 a4 

is stored a1 a3 a2 a4.

you're going want 4 loops - 2 iterate on blocks, , 2 perform transpose-copy of single block. assuming simplicity block size divides size of matrix, think, although i'd want draw pictures on backs of envelopes sure:

for (int = 0; < n; += blocksize) {     (int j = 0; j < n; j += blocksize) {         // transpose block beginning @ [i,j]         (int k = i; k < + blocksize; ++k) {             (int l = j; l < j + blocksize; ++l) {                 dst[k + l*n] = src[l + k*n];             }         }     } } 

an important further insight there's cache-oblivious algorithm (see http://en.wikipedia.org/wiki/cache-oblivious_algorithm, uses exact problem example). informal definition of "cache-oblivious" don't need experiment tweaking parameters (in case blocksize) in order hit good/optimal cache performance. solution in case transpose recursively dividing matrix in half, , transposing halves correct position in destination.

whatever cache size is, recursion takes advantage of it. expect there's bit of management overhead compared strategy, use performance experiments to, in effect, jump straight point in recursion @ cache kicks in, , go no further. on other hand, performance experiments might give answer works on machine not on customers' machines.


Comments

Popular posts from this blog

c# - How to set Z index when using WPF DrawingContext? -

razor - Is this a bug in WebMatrix PageData? -

android - layout with fragment and framelayout replaced by another fragment and framelayout -