Altera_Forum
Honored Contributor
7 years agoDoes initiation interval of 1 means no need for unrollment?
I have one basic question. In case of Single-Thread mode kernels, if I have one single loop and I'll be able to achieve initiation interval of one, is it still beneficial to unroll the loop? My assumption is, when you have II=1, then at each clock cycles you can go one step forward and it can achieve something almost like the unrollment of for loops.
In order to verify this observation, I have created some synthetic kernels. The kernels schema are like below:#define LL 16384# define UL 128
__kernel void WGSXMAPIXLL16384OPS8ST( const __global float * restrict GIn,
__global float * restrict GOut,
const float M,
const float N,
const float P) {
int Mi = M;
float tempOutTotal = 0;
# pragma unroll UL
for (int lcdd =0; lcdd < LL; lcdd++) {
float tempOut = N + lcdd;
float temp1 = 1.0;
//float temp2 = 1.0;
//float temp3 = 1.0;
//float temp4 = 1.0;
float temp = 0.0;
temp1 += temp1 * tempOut;
temp1 += temp1 * tempOut;
... // Total number of FMA ops
temp1 += temp1 * tempOut;
GOut = temp1;
//tempOutTotal += tempOut;
}
}
The LL can get values of 16384, 32768, 65536, 131072, 262144, 524288, 1048576. The number of FMA operations is set to 8 and 16. The unrollment factor (UL) is set to 128 and 64, for kernels with 8 and 16 number of FMAs. Now, when I compile and run these kernels I get below performance numbers in GFLOPS (for GFLOPS I count the total number of operations, divided by the time it takes to finish the kernel.)# OPS=8# OPS=16 LL 16384 14.42 28.27 LL 32768 19.11 51.41 LL 65536 32.86 59.32 LL 131072 36.92 78.40 LL 262144 39.24 98.85 LL 524288 47.32 99.74 LL 1049576 51.26 111.2 Based on the numbers above, I don't understand why even having II=1 is not enough and we need to further unroll the code!! Another thing is, why increasing LL increases the performance? Is the initialization overhead of FPGA is so high that we need large kernel execution to hide that overhead? My assumption was, unlike GPU, FPGA can start using the pipe really fast and should not introduce so much overhead. Can anyone help me to understand the above observations?? Thanks,