Forum Discussion

Dr_FPGA's avatar
Dr_FPGA
Icon for New Contributor rankNew Contributor
5 years ago

OpenCL private_copies attribute does not seem to work in 20.1

Hello OpenCL FPGA developers,

I have an OpenCL NDRange (64,1,1) kernel with 8 times replicated multiple local memories which make this kernel memory size limited (>100% M20s on A10). I have attempted to limit the replication factor by applying the newly introduced attribute in UG-OCL002 | 2020.04.13 20.1 aocl_programming_guide.pdf page 41.

Example for one of the buffers:

__local float __attribute__((private_copies(4))) x[M][N];

However, this attribute does not seem to have intended effect and I am stuck with 8 times replicate private copies. I know that reducing replication by a factor of 2 will make my kernel slower, but I could use less memory tradeoff for a bit slower kernel. Moreover, the speed decrease when all these buffers are used is a small percentage of the overall kernel schedule.

Thank you for your input.

7 Replies

  • HRZ's avatar
    HRZ
    Icon for Frequent Contributor rankFrequent Contributor

    Can you post a snippet of the report for that specific local memory buffer before and after adding the pragma? Note that if the replication is happening to support "simultaneous work-groups", it won't be possible to reduce the replication factor using the "private_copies()" pragma or any other local memory pragma.

    • Dr_FPGA's avatar
      Dr_FPGA
      Icon for New Contributor rankNew Contributor

      Hi HRZ,

      As I mentioned, the report shows before and after 8 copies. The same factor is for the loop of 4 where there is no need to replicate beyond 4. I suspect I have something wrong with the syntax of this attribute. Please note this is new attribute in 20.1 and a similar attribute exists in oneAPI, so you may have not seen it or tried it yet.

  • AnilErinch_A_Intel's avatar
    AnilErinch_A_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi ,

    Thanks for your valuable suggestions.

    This has been noted and we are working on the same about providing more detailed explanations in the user guide about this attribute.

    Thanks and Regards

    Anil


    • HRZ's avatar
      HRZ
      Icon for Frequent Contributor rankFrequent Contributor

      @AnilErinch_A_Intel Providing better explanation in the documentation is useful; however, it does not solve the underlying problem. What is required here is to give the users an extra attribute to control the on-chip memory replication factor for "supporting simultaneous work-groups". i.e. the possibility to control the number of work-groups the compiler would schedule simultaneously in a single compute unit. The functionality already exists in the compiler; it just needs to be exposed to the users in form of an attribute/pragma. I opened a support ticket with Altera/Intel about this exact problem years ago, but such attribute/pragma is yet to be provided.

  • AnilErinch_A_Intel's avatar
    AnilErinch_A_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi @HRZ,

    Thanks for the suggestions ,

    I will check with the team , about the feasibility of exposing this feature.

    Thanks and Regards

    Anil