Forum Discussion
Altera_Forum
Honored Contributor
8 years agoThe multi-kernel approach could certainly help, as long as you can logically split your computation to separate sequential sections. I personally have not tried doing this yet, but you can put the different kernels in separate files and compile them individually. Then, in the host code, you will load one kernel image, compute, leave the output on device memory, load the second kernel, use the output of the first kernel as input, and generate the final output. You might need some extra buffer management in this case; e.g. you might need to free the input buffer of the first kernel after its execution has finished, to make up space for the output buffer of the second kernel.