User Profile
User Widgets
Contributions
Re: Achieving parallel execution of loop on FPGA
maybe something like this, and even I don't know if the compiler can figure out const int ITEM_LENGTH = 10000; const int GROUP_SIZE = 10; uint16_t a[ITEM_LENGTH][GROUP_SIZE]; uint16_t b[ITEM_LENGTH][GROUP_SIZE]; [[intel::ivdep]] for (int k = 0; k < GROUP_SIZE * ITEM_LENGTH; k++) { int i = k / GROUP_SIZE; int j = k - i * GROUP_SIZE; if ( i == 0 ) a[i][j] = j; else a[i][j] = a[i-1][j] + i; } [[intel::ivdep]] for (int k = 0; k < GROUP_SIZE * ITEM_LENGTH; k++) { int i = k / GROUP_SIZE; int j = k - i * GROUP_SIZE; if ( i == 0 ) b[i][j] = j; else b[i][j] = b[i-1][j] + i; } But what the problem with the two loops if it work ?2.5KViews0likes0CommentsRe: Achieving parallel execution of loop on FPGA
Hi @Mickleman, I am not sure but I assume that is the way you get you i and j out of the division and the modulo. I imagine you rewrite as follow (that do basically the same, without branching) uint16_t* c=(uint16_t*)a bool test=(k<GROUP_SIZE) ; int i=k / GROUP_SIZE; c[k]= (c[k-1]+i)*(!test)+(test)* (k-i*GROUP_SIZE); you can get that the compiler don't see the opportunity over the GROUP_SIZE parallelization. I think to get it automatically you should permute you array, a[j][i] ==> a[i][j] so the dependency inside look like c[k-GROUP_SIZE]. I don't know if I am clear in what I mean 🙂2.5KViews0likes2CommentsRe: Achieving parallel execution of loop on FPGA
Hi, Did you try with "#pragma unroll 2" ? I am working on something similar and I would say that something like that should almost work const int ITEM_LENGTH = 1000; const int GROUP_SIZE = 10; uint16_t a[2][GROUP_SIZE][ITEM_LENGTH]; #pragma unroll 2 [[intel::ivdep]] for (int block = 0; block < 2; block++) [[intel::ivdep]] for (int j = 0; j < GROUP_SIZE; j++) for (int i = 0; i < ITEM_LENGTH; i++) if ( i == 0 ) a[block][j][i] = j; else a[block][j][i] = a[block][j][i-1] + i; good luck2.5KViews0likes9Comments"Error enumerating AFCs: not found" is back
Hi, the error that says that it cannot find the hardware is back I compiled a OneAPI software but I cannot run it on the hardware. Previous compilation that was working does not work anymore either. Even "aocl" command does not work anymore. I try many Arria10-OneAPI nodes, and it's not a node specific issue but a general issue. However, few days ago already one of the node had this specific issue. But now all of them are affected. Best, Mathieu May be related to: https://community.intel.com/t5/Intel-High-Level-Design/cannot-run-on-fpga-hardware/td-p/1281525Solved2.7KViews1like10CommentsAFU-OPAE example with DMA with custom computation
Hi everyone, Does it exist somewhere an example of DMA with some custom computation ? I have an example of custom computation and DMA, but none with both joined. I am new in the AFU world and honestly I have no clue how to do connection in platform design. It looks like a jungle. Best, Mathieu1.1KViews0likes2Comments