Forum Discussion
UMinh
New Contributor
6 years agoThank you for your suggestion. I could do that for single work item kernels but some of my kernels are NDRange. Yes it is fine if kernels are memory limited but I have multiple kernels and I want to quantify how severe memory bottleneck is in each relative to each other. If I have processing time on chip via simulation, I could subtract that from the total execution time on actual board to measure what percentage of time is spent in memory accesses.