Forum Discussion
Hi @PGorl1, I am the manager to the engineer assigned to this case. Can you please elaborate a little more about you meant that doing the approach will add penalties in terms of performances: lower fmax and Occupancy%? Do you have any data that you could provide to support this so that we can evaluate at our end on what we can do differently?
There are two approaches to tackle the problem of reading from/writing to an I/O channel with a width smaller than the physical width in an OpenCL kernel:
1- Manually pad the data to the physical width of the I/O channel. This reply explains perfectly why this approach is counterproductive:
https://forums.intel.com/s/question/0D70P000006S5nSSAS/
Essentially, even if the programmer wants to use only a small portion of the I/O channel's bandwidth, this approach will limit the throughput of the design to the bandwidth of the I/O channel since the bandwidth of the channel will be exhausted and the pipeline will be frequently stalled to compensate for this.
2- Manually buffer the data in the OpenCL kernel and pass it to the channel through full-width writes once every few loop iterations as suggested here:
https://forums.intel.com/s/question/0D70P000006RWHbSAO
This approach will be extremely difficult to implement if the physical channel width is not a multiple of the data size, it will have a large area and operating frequency overhead since it will require barrel shifters or large register-based buffers to buffer the data (using Block RAM-based buffers will increase the loop II), and it will also require at least one extra loop inside the main loop which will complicate the critical path consisted of the loop exit conditions and further hurt operating frequency; Stratix 10 in particular is very sensitive to this issue.
Fortunately, Intel does not expect people to do only 512-bit read/writes to memory even though that is the physical width of the path between the kernel and each memory bank. In a similar fashion, expecting people to only do same-width read/writes to I/O channels is unreasonable, especially when this problem was automatically handled by the tool-chain before.