Forum Discussion
Altera_Forum
Honored Contributor
7 years agoThis is exactly what I was expecting; the compiler is being stupid and trying to pipeline 254 work-groups in the same compute unit just to keep the pipeline full, and the behavior has not changed since v15.1. I usually get less than 10 times but I have seen cases with over 100. The compiler does actually check for Block RAM overutilization; however, if it detects overutilization, it will start sharing the ports rather than reducing the work-group pipelining which is not actually necessary in many cases. They had a note in the documents of v16 that explicitly said this factor cannot be controller; however, they also removed that in the versions after that.
My recommendation: Since Intel is clearly refusing to either fix or give user control over this useless extra replication, which, as you can see, makes a lot of trouble in many cases, please open a ticket with Intel, post your kernel and complain to them so that they might eventually consider adding an attribute for it. In fact, I myself have been planning to open a ticket for this exact issue in the past few days but haven't got the time yet. The more people complain about the same thing, the higher the possibility of them fixing it will become. As a work-around, if you don't need SIMD, using max_work_group_size instead of reqd will reduce the number of simultaneous work groups to 2 or 3. ...