Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
11 years ago

Loop induction variable decrementing instead of incrementing?

I've been working with an implementation of the Needleman-Wunsch algorithm for global sequence alignment. It is very similar to Smith-Waterman in that it has a very natural representation as a systolic array. I'm trying to build a version of the kernel so that the compiler recognizes this feature, however I'm running into a strange issue where the inner loop decrements instead of increments. I've tested my code against a few GPUs and CPUs and they all are correct.

Here is the kernel where I've hard-coded the problem/block size to be 16x16 for debugging purposes:

# define BLOCK_SIZE 16
__kernel 
__attribute__((task))
void nw_kernel1(
           __global int * restrict input_itemsets_d, 
           __global int * restrict output_itemsets_d, 
           __global int * restrict seq_1,
           __global int * restrict seq_2,
           int penalty
           )
{  
    
    __private int Sd_private;
    __private int Sh_private;
    __private int Sv_private;    
    //prime first row with values
    
    Sh_private = 0;
    //INIT ROW VERTICAL INPUTS
   # pragma unroll
    for(int i = 0; i < BLOCK_SIZE; i++){
        Sv_private = SCORE_GLOBAL(0, (i + 1));
    }
    
    Sd_private = 0;
    //for each row
    for( int row = 0 ; row < BLOCK_SIZE; row++){    
        printf("ROW = %d\n", row);
        //INITIALIZE INPUT
        // prime Sh location for row
        Sh_private = SCORE_GLOBAL((row + 1), 0);
        int score_y = seq_2;
        //for each column in row
       # pragma unroll BLOCK_SIZE
        for(int col = 0; col < BLOCK_SIZE ; col++){
            printf("COL = %d\n", col);
            int score_x = seq_1;
            int ref = reference_l;
            int Sd = Sd_private;
            int Sh = Sh_private;
            int Sv = Sv_private;
            //COMPUTE
            // Assign score based on other values
    
            int tmp = maximum((Sd + ref),
                               Sh - penalty,
                               Sv - penalty);
            //store to global memory
            SCORE_GLOBAL_O((row + 1), (col + 1)) = tmp;
            if(col != 15) {
                //SHIFT
                // Store to private arrays for next iteration
                // 1) store Sd values for next row
                Sd_private = tmp;
                // 2) shift register to pass Sv values to Sd of next column
                Sd_private = Sv_private;
                // pass Sh values to next column
                Sh_private = tmp;
                //pass Sv values to current column, next row
                Sv_private = tmp;
            }
            
        }
        //SHIFT
        // for starting column, last Sh value can be new Sd value
        Sd_private = Sh_private;
        //save all tmp Sd values for next loop iteration
       # pragma unroll
        for(int i = 1; i < BLOCK_SIZE; i++){
            Sd_private = Sd_private;
        }
    }    
    return;   
}

What gets printed out (which also explains the incorrect output) is the following

ROW = 0

COL = 0

COL = 15

COL = 14

etc...

Is there anything obvious I'm doing wrong? Or is this most likely a compiler bug?

PM me if you would like access to the source code.

-Jack

4 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Are you sure that console output goes with that kernel? For example you have this which shouldn't have compiled: printf("****COL = %d\n");

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Yeah, sorry that was just a sloppy copy paste error. I've adjusted the kernel code above to reflect the actual code that prints the column order as 0,15,14, etc...

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    There are two issues here: 1) Functional error in the code, which was a bug in the compiler; it will be fixed in the next release. 2) Printf instructions printing out-of-order. Currently, this is expected. There is no ordering guarantees between different printf calls. In this case, because the loop is unrolled, the body contains multiple printfs (one for each iteration), and these printfs can print in different order, not in iteration order.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Outku,

    Thanks for the reply. This all makes sense based on the output I'm seeing.

    - Jack