Forum Discussion
Yep, that's exactly how it works. So lets say you have kernels "A" and "B" in the same .cl file compiled into a single .aocx file. When you enqueue kernels A and then B, the hardware will remain programmed when the second kernel (B) is invoked in the same .aocx file. The only limitation you may face is that by putting more kernels into the same .cl file you will use up more hardware resources which may fill up the chip or require you to undo some optimizations you may have previously included to each one independently (to free up some room to make them fit). The other challenge is if you have multiple operating concurrently in the hardware if they all access global memory at the same time you might run out of bandwidth, in cases like those sometimes you can combine kernels to minimize the bandwidth (for example if kernel A writes to global memory then kernel B reads that data in, just combine the kernel and move the data using private/local memory instead).