Data dependency caused by conditional global memory read

Question

Hi,  When I compile my code, in the loop analysis of the generated report file there's one loop which has iteration interval of 4, which means it's not pipelined well.   
   for (uint loop_cnt = 0, w = 0, cp = 0; loop_cnt &lt; COLS_PER_PE * TILE_SIZE; loop_cnt++) {
                //load data to local buffer
                uint x = (n * BLOCK_SIZE + cp * TILE_SIZE + w) % conv_dim1 + i;
                uint y = (n * BLOCK_SIZE + cp * TILE_SIZE + w) / conv_dim1 + j; 
                if ((cp * COLS_PER_PE + w &gt; BLOCK_SIZE - 1 - col_pad_size &amp;&amp; n == (conv_dim1 * conv_dim2 + col_pad_size) / BLOCK_SIZE - 1) || (x &lt; pad_size || x &gt; data_dim1 - pad_size - 1 || y &lt; pad_size || y &gt; data_dim2 - pad_size - 1)) {
                   # pragma unroll
                    for(uint v = 0; v &lt; CVEC; v++) {
                        data_double_buf = 0.0f;
                    }
                }
                
                else {                                                  
                    data_double_buf = input;                                
                }
                
                //load weight to local buffer
                if(cp * TILE_SIZE + w &lt; BLOCK_SIZE - row_pad_size) {
                    //For the first 2 convolutional layers
                    if(conv_dim3 &lt; BLOCK_SIZE) {
                        weight_double_buf = weight;
                    }
                    //For the last convolutional layer
                    else {
                        weight_double_buf = weight;
                    }
                }
                else {
                    if(conv_dim3 &lt; BLOCK_SIZE) {
                       # pragma unroll
                        for(uint v = 0; v &lt; CVEC; v++) {
                            weight_double_buf.vector = 0.0f; 
                        }
                    }
                }
                //manual loop coalescing
                if(w == TILE_SIZE - 1) {
                    cp += 1;
                }
                if(w == TILE_SIZE - 1) {
                    w = 0;
                }
                else {
                    w += 1;
                }            
            }
  And here is the report about this loop  
                                                    pipelined       II              Bottleneck                        detail
Block7 (conv.cl:107)
                    Yes            
4                
    II
                         Memory dependency
   
Block7:
II bottleneck due to memory dependency between: 
Store Operation (conv.cl:122)
Store Operation (conv.cl:122)
Largest critical path contributor(s):
36%: Store Operation (conv.cl:122)
36%: Store Operation (conv.cl:122)
   I don't see any data dependency here. if the compiler is inferring wrongly, does any one know how to avoid this? (if I make "weight_double_buf" and "data_double_buf"  normal float or remove the conditions, the II will become 1)  And advice would be greatly appreciated! Lancer

altera_forum · Answer

Which line is line 122 in your code? False dependencies on "global" buffers can be avoided by adding# pragma ivdep array(*buffer_name*) before the loop (Best practices guide, Section 5.2). Note that incorrect use of this pragma WILL result in incorrect output.

altera_forum · Answer

Hi HRZ,

Thanks for your reply. The report means there are dependency between line

"data_double_buf[wr_bank_sel][cp][w] = 0.0f;" and line

"data_double_buf[wr_bank_sel][cp][w] = input[h * input_dim1 * input_dim2 + (y - pad_size) * input_dim1 + x - pad_size];"

which belongs to two different conditional branches.

Is it global memory dependency or local memory dependency?

altera_forum · Answer

Since it is a "store" dependency, it is probably the local memory one (data_double_buf). You can try writing the output to a temporary register, and then writing back the value of that register to the local buffer "outside" of the if/else block to see if it removes the dependency.  By the way, why do you need the unrolled for loop here? The statement inside of the loop does not depend on the loop variable. I think you have a typo here.  #pragma unroll
for(uint v = 0; v &lt; CVEC; v++) {
     data_double_buf = 0.0f;
}

altera_forum · Answer

Hi HRZ,

Thanks for your reply!

Yes I had a typo there (that buffer is a data structure, it should be "data_double_buf[wr_bank_sel][cp][w].vector[v] = 0.0f;" Thanks for pointing out.

Is there any pragma that can remove false local memory dependency like# pragma ivdep? (Not for this problem)

altera_forum · Answer

I have never seen the compiler falsely detecting a dependency on local memory accesses and I highly doubt that is even possible. You can always try using "#pragma ivdep" also for local memory dependencies, but I don't think it will have any effect.

Forum Discussion

Data dependency caused by conditional global memory read

5 Replies

Recent Discussions

ram retiming

Reset Release IP for Agilex needs Stratix 10 device files installed!

Licensing ‘Know-How’ Guide

Timing analysis - long combinational path

Invalid license key (inconsistent authentication code)