Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
8 years ago

Unroll loops containing channels

I am bit a bit confuse about this topic as in the "aocl_programming_guide.pdf" (2016.10.31), section "1.6.4.4 Restrictions in the Implementation of Intel FPGA SDK for OpenCL

Channels Extension", it says:

--- Quote Start ---

Because you can only assign a single call site per channel ID, you cannot unroll loops containing channels. ...

--- Quote End ---

However, in the "aocl-best-practices-guide.pdf", section "1.6.1.3 Simplifying Loop-Carried Dependency", the optimized example contains in line 18, an unroll pragma on a for-loop containing a channel call:


12 ...
13 for (unsigned i = 0; i < N; i++) {
14
15   // Ensure that we have enough space if we read from ALL channels
16   if (num_bytes <= (8-NUM_CH)) {
17    # pragma unroll
18     for (unsigned j = 0; j < NUM_CH; j++) {
19       bool valid = false;
20       uchar data_in = read_channel_nb_altera(CH_DATA_IN, &valid);
21       if (valid) {
22         storage <<= 8;
23         storage |= data_in;
24         num_bytes++;
25       }
26     }
27  }
28  ...

Which according to the correspoding report is successfully fully unrolled:


==================================================================================
Kernel: optimized
==================================================================================
The kernel is compiled for single work-item execution.
Loop Report:
+ Loop "Block1" (file optimized3.cl line 13)
| Pipelined well. Successive iterations are launched every cycle.
|
|
|-+ Fully unrolled loop (file optimized3.cl line 18)
Loop was fully unrolled due to "#pragma unroll" annotation.

Perhaps loops containing NON-blocking channel calls are not a problem for loop-unrolling?

2 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The example you are looking at does NOT create multiple call sites per channel ID since it unrolls the channel ID, and not the data being read from it. In fact, in that example, if the loop is NOT unrolled you will get a compilation failure due to variable channel ID. This has nothing to do with whether the channel call is blocking or non-blocking.

    For the sake of clarification, the following is allowed and valid:

    #pragma unroll
    for (int i = 0; i < N; i++)
    {
        data_in = read_channel_altera(CH_DATA_IN);
    }

    But this is not:

    #pragma unroll
    for (int i = 0; i < N; i++)
    {
        data_in = read_channel_altera(CH_DATA_IN);
    }

    Note that the newer versions of the compiler (17+) seem to also support multiple call sites per channel, so the second example might now work (I haven't checked), but it will NOT work with older versions (v16.1.2 and below).