This code snippet is not enough to explain why. Block RAMs on the FPGA only have two ports. Each buffer is replicated based on the number of accesses to it, further increasing the Block RAM requirement. In NDRange kernels, each buffer is further replicated to allow simultaneous accesses from work groups running in parallel. Previous versions of the report clearly showed the requested buffer size, number of reads and writes, number of replications, and final implemented size. This was not ported to the HTML report available in v16.1. However, the report in v17 has a new tab that shows some info about the local buffers; I have never personally used the latter, though.