Forum Discussion
Altera_Forum
Honored Contributor
15 years ago --- Quote Start --- Very interesting! When filling the arrays is there a way to direct the compiler/source code to burst the writes, or should this happen automatically? When I placed performance counters around the transfers, it seems to take more than 25,000,000 cycles to fill the arrays, even when I set the array size to 32768. --- Quote End --- How do you generate a pattern that's going to be filled into DRAM? Does generation uses floating-point arithmetic or integer/floating point conversion? You should realize that on Nios2 both operations are very slow so if you do one of those then the time difference between burst access and single-word access to DRAM is probably lost in noise. Now,assuming you don't do something slow in generator, memory fill via cached Nios2 is still unlikely to achieve good efficiency because of write-back write-allocate architecture of the cache which is very badly suited to large memory fill. However with 100MHz+ CPU clock and 200-300MHz DRAM clock (400-600MT/s data rate) you should be able to fill a single 16-bit DDR SDRAM chip at 100-150 MB/s, i.e at approximately 10% of peak memory throughput. To do any better you can try one of the following: 1. Program+data in internal RAMs. Uncached (__builtin_stwio or upper 2GB) access to DRAM in manually unrolled loop + reliance on merge access feature of HPC2 DDR controller. This solution requires minimal hardware expertise but, IMHO, is not sufficiently robust. 2. Program+data in internal RAMs. All or part of the internal RAM is dual-ported with one memory port connected to CPU via tightly-coupled data port. Other memory port connected to DMA engine. You prepare your fill pattern chunk by chunk in internal dual-ported memory and then DMA it into DRAM. For maximum performance use double-buffer - DMA from the first while filling second, then switch. This solution is most robust but also take more development work and consumes more FPGA resources. Hope that helps, Michael