About Altera OpenCL Compilation

Honored Contributor

11 years ago

Thank you so much for the help! I am just wondering: what if number of iteration for the for loop could not be determine at compilation time (either the operation is conditional or the size N changes during different kernel invocations)? I am forced to access local memory sequentially or is there any other optimization that can be done?

--- Quote Start ---

Ok, I should point out that for this to work, unrolling the column loop is key; this creates consecutive accesses that compiler can merge.

e.g.

for(row = 0; row < N; row++) {

# pragma unroll

for(col = 0; col < 4; col++) {

A[row][col] = row + col;

}

This essentially creates:

for(row = 0; row < N; row++) {

A[row][0] = row;

A[row][1] = row + 1;

A[row][2] = row + 2;

A[row][3] = row + 3;

}

which gets translated to smth like this

for(row = 0; row < N; row++) {

A[row][0] = (int4)( row, row+1, row+2, row+3 ); // very efficient wide access

}

I think this is mentioned in the best practices guide document.

--- Quote End ---

Forum Discussion

Recent Discussions

Generate Simulation Setup Script Fails

FIR IP configured for Interpolation

Altera SSLC License

Lisence issue when running .do script

How to create a Packaged Subsystem in TCL