ContributionsMost RecentMost LikesSolutionsRe: Optimising local memory access You say "These attributes are specific to FPGA kernel optimization and not CPU or GPU ones." In that case, it is of no relevance at all to my question. My hardware has a GPU which I am trying to program. It does not have a FGPA. Please would you go back to my original question and give me what informed you have about GPU optimisation. Ian Re: Optimising local memory access Thank you for the reply that helps a lot. However it raises a few more questions:- How do I find out about features like intel::numbanks() and intel::bankwidth()? Is there any reference documentation describing them properly? There are various tutorials, white-paper and examples, but I have yet to find any reference documentation. What for example is the applicability of the bank control directives above? They appeared in a paper on optimising FGPA access so it is safe to assume that they will be effective on FGPA. I would be very surprised if they have any effect on the CPU, which leaves me doubting if they actually work on GPUs. (I am trying to program a GPU.) Optimising local memory access I am trying to work out precisely what I need to do to achieve best possible performance for accessing local memory in a DPC++ kernel. I am working from the "Accessing Work-Group Local Memory" in chapter 15 of Reinders et al.'s Data Parallel C++. However this leaves a number of unknowns. 1) It talks about "elements" in local memory with saying what size those elements are. I would guess that they are either 4-byte or 8-byte but how can I determine which for any given processor? 2) It talks about banks in local memory. How can I find out how many of these there are? 3) Clearly if two work-items access different elements in the same bank then that will have to be serialised. However what happens if two work-items access the same element in local memory. (e.g. One reads the top-half and the other reads the bottom-half). Do these have to be serialised? Is this the same for read and write operations? I am programming (in the first instance) for a Kaby Lake HD 610 (Device Id: 5906).