--- Quote Start ---
Could you tell which document is this? I have read all the documents (getting started, best practices and programming guide), and I dind't find anything related to that. I have found a way to change harware image, but I think that's not the right way (as I far as I understood flashing a new image should be provided the manufacturer), since this is used to update the DMA and PCIe hardware.
--- Quote End ---
Took a bit of searching but I found it, it is in this document (
https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/hb/opencl-sdk/ug_aocl_s5_net_platform.pdf).
Check section 2.11, Table 2. If you program the FPGA offline using "aocl program *file*.aocx" and then set the "CL_CONTEXT_COMPILER_MODE_ALTERA" environmental variable to 3, this should avoid runtime reconfiguration.
--- Quote Start ---
Could you further explain why this may be not possible?
--- Quote End ---
The fine-grained scheduling that you get on a GPU is not available on an FPGA. Each kernel will have it's own specific circuit on the FPGA and two processes cannot "share" this circuit; best case scenario one process will access the circuit, finish execution, then the second one will start (no parallelism). Worst case scenario this will result in some runtime error. On the other hand, if the processes are accessing different kernels, since each kernel has its own circuit, all processes can run in parallel and the only shared resource will be the DDR memory. If these cases work at all, the latter case will probably be much more likely to work, but you'd never know until you try. If you absolutely need the processes to run the same kernel, you can probably create multiple copies of the same kernel in your cl file but with different names, and call each copy from a different process.
--- Quote Start ---
Rewrite the code is not an option, this software is about 400 thousand lines of code (and very old). So, removing MPI is not an option. In order to use correctly, should I share program and context?
--- Quote End ---
I am not really sure; you could try creating new MPI types for context, queue, program, etc. and create the context and everything on the root node and send them from the root process to other processes, but there is no telling what would happen at run time.