Forum Discussion
Altera_Forum
Honored Contributor
8 years agoSIMD vectorization is for the data passed into kernel, only when your input data can be vectorized should it benefit the performance.
In your code only constant M is passed in and it can't be vectorized, I would guess that's why the resource usage is the same. If your goal is to do parallel execution like how it does on GPU, you should experiment with compute unit settings, but it's still not quite the same with GPU in some aspect. Bottom line you can launch parallel kernels separately under different kernel name and different queue, this way it's definitely paralleled:p