Forum Discussion
Altera_Forum
Honored Contributor
8 years agoThe con is cost mostly. Having the memory on the interposer is harder to do (hence cost), but everything is closer to the chip which helps performance due to lower distance and better signal integrity.
Speed wise if the overview is to be believed is 10x faster than regular DDR -> "Traditional DDR4 DMMs provide ~21 GBps bandwidth while 1 HBM2 tile provides up to 256 GBps.". From what I can gather from the numbers, the 21GBps figure for DDR4 comes from 72bit (64bit+ECC) @ 2400Mbps. The 256GBps comes from 16 channels in the cube, each channel having 8 layers, each layer being 64bit wide running at 2048MHz. So the cube itself is running at a lower frequency, but the data width is effectively 8192bit wide (it's not in practice that wide as the data is serialised onto high speed lines, but deserialised it would be that wide). From what I can gather, the memory cubes connect to the FPGA directly - i.e. not through the HPS bridge. You can probably route one through to the ARM processor via the HPS bridge. The processor itself has it's own DDR3/4 EMIF.