Altera_Forum
Honored Contributor
8 years agoMinimum II of 2 but HTML report has no further information
I have a single work item kernel with a local mem used for a ping pong buffer with a form similar to the following:
local float __attribute__((bankwidth(4), numreadports(2), numwriteports(2), doublepump, bank_bits(2,1,0))) mem[1024][4][2]; for (uint outer_outer = 0; outer_outer < 8; ++outer_outer) { // some integer add,sub,and shifts that are used to help compute x_idx, y_idx later float x_pipe[4]; float y_pipe[4]; uint x_idx_pipe[4]; uint y_idx_pipe[4]; for (uint outer = 0; outer < 8; ++outer) { uint x_idx, y_idx; // compute x_idx, and y_idx using integer add, subs, and shifts # pragma unroll for (uint inner = 0; inner < 4; ++inner) { float x_fetched = mem[x_idx][inner][0]; float y_fetched = mem[y_idx][inner][0]; mem[x_idx_pipe[0]][inner][1] = x_pipe[0]; mem[y_idx_pipe[0]][inner][1] = y_pipe[0]; // shift register statements + computations on x and y x_pipe[3] = x_fetched; y_pipe[3] = y_fetched; x_idx_pipe[3] = x_idx; y_idx_pipe[3] = y_idx; } } } The compiler seems to detect the parallelization of the inner loop correctly, but my II on the 'outer' loop is 2. Unfortunately there is no additional information in the Loop Analysis section of the HTML report about what's the limiting factor. Does anyone here have any insight into what it means if the HTML report doesn't provide info on what's limiting the II? Does that mean the control logic is causing it hence there's nothing I can do? I've tried forcing it using# pragma ii 1 but the compiler fails. Looking at the system view I notice the two store ops are sequential (the second dependent on the first) but am unsure if this is just a graphical thing (I.E. the system view doesn't display doublepump allowing for parallel store).