Forum Discussion
Altera_Forum
Honored Contributor
12 years ago --- Quote Start --- I suspect what happened is that num_share_resources found a small candidate for logic sharing but the change in resources was so minor that in the overall design it doesn't make much of a difference. In general any time you share hardware there is a small logic penalty to implement the sharing logic so if that sharing logic has the same footprint as the logic being shared itself then you could run into results like you have seen. Are you declaring a reqd_work_group_size or max_work_group_size attribute by any chance? If not I would considering using one of them if possible since you can typically save resources when using them because the kernel hardware will be tailored to what you need. If possible I would use reqd_work_group_size since that will result in the smallest and fastest hardware possible because the hardware will only need to handle a single work group size. Some applications only know the amount of work at runtime but in those cases you can often use reqd_work_group_size and just pad/discard unneed results. --- Quote End --- I was not using "reqd_work_group_size" ,but what I noticed when including "reqd_work_group_size" attribute and running aoc --report is that this attribute has no effect on the size of the hardware generated . For reqd_work_group_size(2,2,1) and reqd_work_group_size(1080,720,1) the logic utilization stayed constant . I also think this can reduce the amount of hardware, but finally it's probably the number of operations in the kernel which define a number of pipeline stage and therefore the amount of hardware generated . So, if you have a complex kernel, the compiler will generate a high amount of hardware. So,assuming that "num_share_resources" is the ideal attribute to lead the compiler use utilization/performance trade-off, the way to reduce logic utilization with this attribute is to have a high amount of code blocks that can be shared, so that the overall design make difference. I wonder how this can be done in C code .