Forum Discussion
Altera_Forum
Honored Contributor
8 years ago --- Quote Start --- What about: I - Add a clFlush after each clEnqueueTask() --- Quote End --- This changed the way the kernels ran (each one ran longer) but the over all time was the same. i.e. It seems like each kernel was started but it couldn't complete until the previous one completed. e.g. Without the clFlush() $ bin/host 100000 4 Reprogramming device [0] with handle 1 Task:0 complete (4.189 ms) Task:1 complete (8.172 ms) Task:2 complete (12.137 ms) Task:3 complete (16.093 ms) Time: 16.099 ms (4.025 ms / kernel) Sum 0-100000.000000 (step 1.000000) = 5000050000.000000 Sum 0-100000.000000 (step 1.000000) = 5000050000.000000 Sum 0-100000.000000 (step 1.000000) = 5000050000.000000 Sum 0-100000.000000 (step 1.000000) = 5000050000.000000 e.g. w/clFlush() $ bin/host 100000 4 Reprogramming device [0] with handle 1 Task:0 complete (12.253 ms) Task:1 complete (12.283 ms) Task:2 complete (12.286 ms) Task:3 complete (16.191 ms) Time: 16.197 ms (4.049 ms / kernel) Sum 0-100000.000000 (step 1.000000) = 5000050000.000000 Sum 0-100000.000000 (step 1.000000) = 5000050000.000000 Sum 0-100000.000000 (step 1.000000) = 5000050000.000000 Sum 0-100000.000000 (step 1.000000) = 5000050000.000000 --- Quote Start --- II - Profile the FPGA design (or print all start and end timestamps of the kernels' events) to see if kernels overlap in time. --- Quote End --- https://alteraforum.com/forum/attachment.php?attachmentid=14752&stc=1