How to run multiple work groups in parallel using ND Range
As far as I know , each work group is passed sequentially to the kernel and all the work-items are run in parallel. Example : a 3D array of 8*8*8 would have 512 points, each work group size is 4*4*4 ...