Forum Discussion
Altera_Forum
Honored Contributor
11 years agoThank you for the clarification! I am just wondering if I have a few kernels in one .cl/.aocx file, where each kernel uses about the same amount of local memory, will AOCL compiler able to generate a single memory block and share it between the kernels? or will it just instantiate a different local memory for each kernels? The kernels that I have will not be running concurrently. The previous kernel must finish before the next can start; and I would like to allocate as much local memory as possible. I am just asking because the AOCL release note mentioned that "num_share_resources", "max_share_resources" and "max_unroll_loops" are deprecated and no work around was given.
Also, I am wondering if there is a way to tell the AOCL compiler explicitly that I won't run the different kernels contained in the same file concurrently so that it could optimize the hardware better, or does that not make any difference anyway? Regards, Ryan --- Quote Start --- It is the latter; the programming guide recommends that "the entire kernel should have 4 or fewer different accesses to local memory. Basically, each load/store instruction in the kernel becomes a client (i.e. master) for local memory. Because local memory has at most 4-ports, if you have 4 load/store instructions, each port will be connected to a single load/store so that load/store instructions will not compete with each other. This guarantees the most efficient hardware. If you have 3 or fewer store instructions and many loads (loads+stores > 4), the compiler may choose to replicate the local memory. This also gives fast accesses at the local memory at the expense of RAMs. Further, the compiler performs some optimizations to partition the local memory based on the access patterns. Hence, even if there are more than 4 store instructions, you may still get efficient hardware. However, this depends on the complexity of your access patterns and may not always be possible. --- Quote End ---