Forum Discussion
What you are describing is not an FPGA-specific problem; the same problem applies to all accelerators. Many people have worked on streaming/pipelining computation between a CPU and GPU, there should also be multiple examples of doing this on FPGAs (probably not with OpenCL, though). Usually this is done by double- or multi-buffering, where input is partitioned into multiple chunks, one chunk is processed by the CPU and then written to buffer A on the accelerator. While the accelerator is processing that buffer A, the CPU processes the second chunk and writes to buffer B on the accelerator. Then the accelerator switches to buffer B when buffer A is done, and the CPU switches from buffer B to buffer A and so on. You can use OpenCL events, or custom locks/flags, to synchronize the accelerator and the CPU in this case.
The concept of host channels have also been recently added to Altera's compiler that allows you to stream data directly from the host to the FPGA, but that is only available on Altera's reference board.