Forum Discussion
Altera_Forum
Honored Contributor
10 years agoMy guess is that the optimizer fails to do a good job with the size of (80, 80). Can the problem possibly be simplified for powers of two? Have you tried to implement the problem as a single work item kernel? Those tend to be more efficient and the compiler is more predictable.