I think I saw a reference somewhere (probably Altera's OpenCL documents) that said it is possible to program the FPGA with your aocx file offline (rather than at runtime using the host code) and then set an environmental variable when running the host code to prevent runtime reconfiguration. If you put all the kernels that are needed by all of your processes in the same cl file and compile all of them into one aocx file, it might work. But of course there are a million things that could still go wrong in the process: e.g. Altera's runtime might not at all allow two different processes to access the same FPGA board simultaneously. Apart from that, if this actually works, it will probably only work if your processes are each accessing different kernels; two processes accessing the same kernel will more likely than not fail.
Needless to say, there is no logical reason to "parallelize" accesses to an accelerator (be it GPU or FPGA or anything else) using MPI on one machine; you can simply create multiple queues under one process and run as many kernels as you want in parallel in the same context. My recommendation is to rewrite your original code to do this instead of using MPI. But of course if you want to scale your code over a network of FPGAs on different machines, then the original MPI-based approach will be the correct solution.