Forum Discussion
Altera_Forum
Honored Contributor
8 years agoThis:
--- Quote Start --- read(); --->WI 2 cal(); --->WI 1 write(); --->WI 0 --- Quote End --- And this: --- Quote Start --- and If I set SIMD to 2, Is the execution like: read(); --->WI 3, WI 4 cal(); --->WI 2, WI 3 write(); --->WI 0, WI 1 --- Quote End --- Though the latter will need a minor correction as follows:read(); --->WI 4, WI 5
cal(); --->WI 2, WI 3
write(); --->WI 0, WI 1 SIMD vectorizes all operations including memory accesses and compute. Please note that all work-item will have to wait at barriers in NDrange kernels and hence, the pipeline will not extend from one barrier region to another. I am not exactly sure how barriers are handled in this case, but either a long enough delay is inserted into the pipeline to make sure no work-item passes a barrier before all other ones have reached it, or each region between two barriers is mapped to a different pipeline, with the pipeline before each barrier being fully flushed before any of the work-items enter the pipeline after the barrier.