Forum Discussion
Altera_Forum
Honored Contributor
11 years agoHello Jack and Sean,
I used both the NDRange and shift register implementations for a 7x7 convolution. To obtain acceptable performance (my goal was 1 work item per clock cycle), the NDRange implementation required the use of the local memory, which implies copying a portion of the image + the 3 neighboring rows above the portion, the 3 neighboring rows below the portion, the 6 neighboring rows on the left and right of the portion, for every workgroup. That creates a lot of redundancy, and uses a lot of local memory. The shift registers implementation only needs one buffer to hold 6 lines and 7 pixels of the image, which uses less resources for a slightly better throughput. Hope this helps. Regards koper