How does local memory behave when used in autorun kernels for FPGA-based OpenCL design ?
I am using local memory in autorun kernels as buffers to hold data within the kernels. The autorun kernels are replicated multiple times by using pragma "__attribute__((num_compute_units(LANE_NUM)))
".
It turns out that the local memory in the first compute unit (get_compute_id(0)==0) will also be write multiple times by other compute units (get_compute_id(0)>0) during software emulation. But for hardware, the local memory will only be write one time by its own compute unit.
This is a bit strange and there is no clear defination of the behavior of how local memory works for autorun kernels.
1) Are they shared between the multiple compute instance decleared by "__attribute__((num_compute_units(LANE_NUM)))" ?
2) Do they hold previous data when restart automatically ?