Forum Discussion
Altera_Forum
Honored Contributor
8 years agoThat's a good point I had not considered, although in my case my "outer_outer" loop is executed serially (as correctly reported by the tool) due to the structure of my code. Therefore an II of 1 is achievable on the "outer" loop IF the tool utilized both write ports of the BRAM simultaneously, which is a common design pattern in HDL, but I guess not supported at this time by the OpenCL compiler.
As an aside, from an RTL perspective, doublepumping really does more than just reduce BRAM usage - it increases memory bandwidth, or throughput, of each BRAM. If I have two singlepumped BRAMs I should be able to do 1 write, 2 reads per kernel clock; if I have one doublepumped BRAM I should be able to do 2 writes and 2 reads per kernel clock. EDIT: I think I may have been wrong. According to the Best Practices guide: --- Quote Start --- By default, each local memory bank has one read port and one write port. The double pumping feature allows each local memory bank to support up to three read ports. --- Quote End --- I wonder if the M20K read latency is half the write latency or something like that which results in this....