Altera_Forum
Honored Contributor
10 years agoQuestion about kernel vectorization
Hi all,
I am trying to use SIMD optimization on a simple vector copy kernel like A = b (both vectors are in global memory). What I found is that when I use SIMD(4)/SIMD(8), the efficient global memory will be increased to 4.3X/8.4X compared with non-optimized codes. But I think in ideal case the improvement will be limited to 4/8 when using SIMD(4)/SIMD(8). Then why the actual improvement I got exceeded the theoretical ideal case? Any suggestion is appreciated. Thanks.