Forum Discussion

JJaco16's avatar
JJaco16
Icon for New Contributor rankNew Contributor
4 years ago
Solved

Local memory ram block consumption opencl report

Hi, I have a local memory declared in my kernel code, which is of type int4 and declared as following local int4 E[16][2] My single work item kernel code accesses this array with 3 reads and 3 ...
  • HRZ's avatar
    4 years ago

    Things have changed quite a bit with respect to M20K replication in the compiler since the last time I used it but there are a few things to have in mind in your case:

    - Your buffer has a width of 128 bits. The physical width of M20K ports is 32 bits. As such, a minimum of four M20Ks will be required to support a width of 128 bits even if your buffer is small enough to fit in one M20K.

    - Each M20K has only two physical ports, which, with double-pumping, will be extended to 4 virtual ports. However, you buffer requires 6 simultaneous reads and writes in total and as such, again, it is impossible to implement it using only one M20K due to lack of enough ports.

    The way I would count the replication factor is that all write ports need to be connected to all M20Ks used for the buffer, while each read port only needs to be connected to one. With double-pumping, 3 out of the 4 virtual ports in each M20K will be occupied by write ports, so three M20Ks will be needed to support all the 6 simultaneous accesses. On top of that, the whole structure needs to be replicated 4 more time to support a width of 128 bits, leading to a total of 12 M20Ks required to implement your buffer. I am not sure why the compiler is counting 14 here, though. The replication factors are usually mentioned in the "Additional information" part of the report which seems to have been cut off in your screenshot. Is there anything else mentioned in that part of the report?