Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
8 years ago

Data dependency caused by conditional global memory read

Hi,

When I compile my code, in the loop analysis of the generated report file there's one loop which has iteration interval of 4, which means it's not pipelined well.


   for (uint loop_cnt = 0, w = 0, cp = 0; loop_cnt < COLS_PER_PE * TILE_SIZE; loop_cnt++) {
                //load data to local buffer
                uint x = (n * BLOCK_SIZE + cp * TILE_SIZE + w) % conv_dim1 + i;
                uint y = (n * BLOCK_SIZE + cp * TILE_SIZE + w) / conv_dim1 + j; 
                if ((cp * COLS_PER_PE + w > BLOCK_SIZE - 1 - col_pad_size && n == (conv_dim1 * conv_dim2 + col_pad_size) / BLOCK_SIZE - 1) || (x < pad_size || x > data_dim1 - pad_size - 1 || y < pad_size || y > data_dim2 - pad_size - 1)) {
                   # pragma unroll
                    for(uint v = 0; v < CVEC; v++) {
                        data_double_buf = 0.0f;
                    }
                }
                
                else {                                                  
                    data_double_buf = input;                                
                }
                
                //load weight to local buffer
                if(cp * TILE_SIZE + w < BLOCK_SIZE - row_pad_size) {
                    //For the first 2 convolutional layers
                    if(conv_dim3 < BLOCK_SIZE) {
                        weight_double_buf = weight;
                    }
                    //For the last convolutional layer
                    else {
                        weight_double_buf = weight;
                    }
                }
                else {
                    if(conv_dim3 < BLOCK_SIZE) {
                       # pragma unroll
                        for(uint v = 0; v < CVEC; v++) {
                            weight_double_buf.vector = 0.0f; 
                        }
                    }
                }
                //manual loop coalescing
                if(w == TILE_SIZE - 1) {
                    cp += 1;
                }
                if(w == TILE_SIZE - 1) {
                    w = 0;
                }
                else {
                    w += 1;
                }            
            }

And here is the report about this loop


                                                    pipelined       II              Bottleneck                        detail
Block7 (conv.cl:107)
                    Yes            
4                
    II
                         Memory dependency
   
Block7:
II bottleneck due to memory dependency between: 
Store Operation (conv.cl:122)
Store Operation (conv.cl:122)
Largest critical path contributor(s):
36%: Store Operation (conv.cl:122)
36%: Store Operation (conv.cl:122)

I don't see any data dependency here. if the compiler is inferring wrongly, does any one know how to avoid this? (if I make "weight_double_buf" and "data_double_buf" normal float or remove the conditions, the II will become 1)

And advice would be greatly appreciated!

Lancer

5 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Which line is line 122 in your code? False dependencies on "global" buffers can be avoided by adding# pragma ivdep array(*buffer_name*) before the loop (Best practices guide, Section 5.2). Note that incorrect use of this pragma WILL result in incorrect output.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi HRZ,

    Thanks for your reply. The report means there are dependency between line

    "data_double_buf[wr_bank_sel][cp][w] = 0.0f;" and line

    "data_double_buf[wr_bank_sel][cp][w] = input[h * input_dim1 * input_dim2 + (y - pad_size) * input_dim1 + x - pad_size];"

    which belongs to two different conditional branches.

    Is it global memory dependency or local memory dependency?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Since it is a "store" dependency, it is probably the local memory one (data_double_buf). You can try writing the output to a temporary register, and then writing back the value of that register to the local buffer "outside" of the if/else block to see if it removes the dependency.

    By the way, why do you need the unrolled for loop here? The statement inside of the loop does not depend on the loop variable. I think you have a typo here.

    #pragma unroll
    for(uint v = 0; v < CVEC; v++) {
         data_double_buf = 0.0f;
    }
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi HRZ,

    Thanks for your reply!

    Yes I had a typo there (that buffer is a data structure, it should be "data_double_buf[wr_bank_sel][cp][w].vector[v] = 0.0f;" Thanks for pointing out.

    Is there any pragma that can remove false local memory dependency like# pragma ivdep? (Not for this problem)
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I have never seen the compiler falsely detecting a dependency on local memory accesses and I highly doubt that is even possible. You can always try using "#pragma ivdep" also for local memory dependencies, but I don't think it will have any effect.