Forum Discussion

New Contributor

4 years ago

Optimising local memory access

I am trying to work out precisely what I need to do to achieve best possible performance for accessing local memory in a DPC++ kernel. I am working from the "Accessing Work-Group Local Memory" in c...

NoorjahanSk_Intel

New Contributor

4 years ago

Hi,

Thanks for reaching out to us.

>> I would guess that they are either 4-byte or 8-byte but how can I determine which for any given processor?

For systems based on the IA-32 architecture, classification is performed on 4 bytes. For systems based on other architectures, classification is performed on 8 bytes.

For DPC++, classification is performed on 8 bytes.

>> It talks about banks in local memory. How can I find out how many of these there are?

You can use numbanks() memory attribute in your source code to define the number of banks.

For more information you can refer to the below link:

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-fpga-optimization-guide/top/optimize-your-design/throughput-1/memory-accesses/local-and-private-memory-accesses-optimization.html

>>However what happens if two work-items access the same element in local memory. (e.g. One reads the top-half and the other reads the bottom-half).

Could you please elaborate more on this statement?

Could you please provide us with an example/usecase?

Thanks & Regards,

Noorjahan.

Ian_Miller

New Contributor

4 years ago

Thank you for the reply that helps a lot. However it raises a few more questions:-

How do I find out about features like intel::numbanks() and intel::bankwidth()?

Is there any reference documentation describing them properly? There are various tutorials, white-paper and examples, but I have yet to find any reference documentation.

What for example is the applicability of the bank control directives above? They appeared in a paper on optimising FGPA access so it is safe to assume that they will be effective on FGPA. I would be very surprised if they have any effect on the CPU, which leaves me doubting if they actually work on GPUs. (I am trying to program a GPU.)

Forum Discussion

Optimising local memory access

Recent Discussions

Agilex 7 FPGA Starter Kit with oneAPI Toolkit flow not detected over PCIe

MCTP over PCIe VDM routing to PMCI in OFS N6000 FIM configuration and datapath clarification

HLS Compiler 24.1 error - aocl-clang.exe - dll entry point not found

Error faced while executing on Agilex FPGA board....

AI Suite System Throughput Issue