efficient global memory access for dynamic indexing

Honored Contributor

7 years ago

I explained the math behind the memory bandwidth utilization in the other thread:

http://www.alteraforum.com/forum/showthread.php?t=58222

(And previously here: http://www.alteraforum.com/forum/showthread.php?t=57099&p=232613)

For such kernels I would recommend an NDRange implementation since instead of a fixed II, you will get a runtime scheduler which will try to minimize the bubbles and the stalls in the pipeline by varying the II at runtime. Furthermore, you can easily replicate your module using num_compute_units which could provide some benefit for such a kernel. However, as I explained in the other thread, random access will result in very poor memory performance regardless of what you do, and pretty much the only thing that can help is an efficient and complex memory controller and a sophisticated cache hierarchy, none of which exists on current-generation FPGAs.

Forum Discussion

efficient global memory access for dynamic indexing

Recent Discussions

Quartus 20.1std compilation fails for Quartus map - Device 10AS057K2F40I1SG

Is Quartus Prime Pro 22.4 Compatible with Stratix 10 NX Series Device?

Timing analysis - long combinational path

QuartusPro 25.3 Crashed after using the Signal Tap Logic Analyzer

Duplicate_hierarchy_depth / duplicate_register