II is an approximation due to the following stallable instructions

Question

Hi I am analyzing the report from oneAPI FPGA report generation.I am currently facingCompiler failed to schedule this loop with smaller II due to memory dependencySo I came back to a simple vector add example provided from github oneAPI C++_SYCL_FPGA, but I am still seeing the same errorsAnother message that concerns me isII is an approximation due to the following stallable instructions:Load Operation (handler.hpp: 1531 &gt; vector_add.cpp: 19)Load Operation (handler.hpp: 1531 &gt; vector_add.cpp: 20)Store Operation (handler.hpp: 1531 &gt; vector_add.cpp: 22)In my application, I also need to load data from global memory, compute and store back to global memory.Can you suggest me a way to resolve this issue?the source code of vector_add.cpp:#include &lt;iostream&gt;

// oneAPI headers
#include &lt;sycl/ext/intel/fpga_extensions.hpp&gt;
#include &lt;sycl/sycl.hpp&gt;

// Forward declare the kernel name in the global scope. This is an FPGA best
// practice that reduces name mangling in the optimization reports.
class VectorAddID;

struct VectorAdd {
  int *const vec_a_in;
  int *const vec_b_in;
  int *const vec_c_out;
  int len;

void operator()() const {
    for (int idx = 0; idx &lt; len; idx++) {
      int a_val = vec_a_in[idx];
      int b_val = vec_b_in[idx];
      int sum = a_val + b_val;
      vec_c_out[idx] = sum;
    }
  }
};

constexpr int kVectSize = 256;

int main() {
  bool passed = true;
  try {
    // Use compile-time macros to select either:
    //  - the FPGA emulator device (CPU emulation of the FPGA)
    //  - the FPGA device (a real FPGA)
    //  - the simulator device
#if FPGA_SIMULATOR
    auto selector = sycl::ext::intel::fpga_simulator_selector_v;
#elif FPGA_HARDWARE
    auto selector = sycl::ext::intel::fpga_selector_v;
#else  // #if FPGA_EMULATOR
    auto selector = sycl::ext::intel::fpga_emulator_selector_v;
#endif

// create the device queue
    sycl::queue q(selector);

auto device = q.get_device();

std::cout &lt;&lt; "Running on device: "
              &lt;&lt; device.get_info&lt;sycl::info::device::name&gt;().c_str()
              &lt;&lt; std::endl;

if (!device.has(sycl::aspect::usm_host_allocations)) {
      std::terminate();
    }

// declare arrays and fill them
    // allocate in shared memory so the kernel can see them
    int *vec_a = sycl::malloc_shared&lt;int&gt;(kVectSize, q);
    int *vec_b = sycl::malloc_shared&lt;int&gt;(kVectSize, q);
    int *vec_c = sycl::malloc_shared&lt;int&gt;(kVectSize, q);
    for (int i = 0; i &lt; kVectSize; i++) {
      vec_a[i] = i;
      vec_b[i] = (kVectSize - i);
    }

std::cout &lt;&lt; "add two vectors of size " &lt;&lt; kVectSize &lt;&lt; std::endl;

q.single_task&lt;VectorAddID&gt;(VectorAdd{vec_a, vec_b, vec_c, kVectSize})
        .wait();

// verify that vec_c is correct
    for (int i = 0; i &lt; kVectSize; i++) {
      int expected = vec_a[i] + vec_b[i];
      if (vec_c[i] != expected) {
        std::cout &lt;&lt; "idx=" &lt;&lt; i &lt;&lt; ": result " &lt;&lt; vec_c[i] &lt;&lt; ", expected ("
                  &lt;&lt; expected &lt;&lt; ") A=" &lt;&lt; vec_a[i] &lt;&lt; " + B=" &lt;&lt; vec_b[i]
                  &lt;&lt; std::endl;
        passed = false;
      }
    }

std::cout &lt;&lt; (passed ? "PASSED" : "FAILED") &lt;&lt; std::endl;

sycl::free(vec_a, q);
    sycl::free(vec_b, q);
    sycl::free(vec_c, q);
  } catch (sycl::exception const &amp;e) {
    // Catches exceptions in the host code.
    std::cerr &lt;&lt; "Caught a SYCL host exception:
" &lt;&lt; e.what() &lt;&lt; "
";

// Most likely the runtime couldn't find FPGA hardware!
    if (e.code().value() == CL_DEVICE_NOT_FOUND) {
      std::cerr &lt;&lt; "If you are targeting an FPGA, please ensure that your "
                   "system has a correctly configured FPGA board.
";
      std::cerr &lt;&lt; "Run sys_check in the oneAPI root directory to verify.
";
      std::cerr &lt;&lt; "If you are targeting the FPGA emulator, compile with "
                   "-DFPGA_EMULATOR.
";
    }
    std::terminate();
  }
  return passed ? EXIT_SUCCESS : EXIT_FAILURE;
}the full message from loop analysis details:VectorAddID.B1:Hyper-Optimized loop structure: disabled.Memory dependencyCompiler failed to schedule this loop with smaller II due to memory dependency:From: Load Operation (handler.hpp: 1531&gt;vector_add.cpp: 19)To: Store Operation (handler.hpp: 1531&gt;vector_add.cpp: 22)Compiler failed to schedule this loop with smaller II due to memory dependency:From: Load Operation (handler.hpp: 1531&gt;vector_add.cpp: 20)To: Store Operation (handler.hpp: 1531&gt;vector_add.cpp: 22)Most critical loop feedback path during scheduling:70.00 clock cycles Load Operation (handler.hpp: 1531&gt;vector_add.cpp: 19)10.00 clock cycles Store Operation (handler.hpp: 1531&gt;vector_add.cpp: 22)1.16 clock cycle 32-bit Integer Add Operation (handler.hpp: 1531&gt;vector_add.cpp: 21)II is an approximation due to the following stallable instructions:Load Operation (handler.hpp: 1531&gt;vector_add.cpp: 19)Load Operation (handler.hpp: 1531&gt;vector_add.cpp: 20)Store Operation (handler.hpp: 1531&gt;vector_add.cpp: 22)Maximum concurrent iterations: Capacity of loopUse theLoop Analysisviewer to estimate capacitySeeFPGA Handbook : Loopsfor more information

whitepau_altera · Answer

Hello!You can learn about this in the loop_initiation_interval tutorial and the kernel_args_restrict tutorial.Basically, you need to tell the compiler that the kernel arguments don't alias with the kernel_args_restrict attribute:struct FunctorKernel {
   // -------------------------------------------
   //         Kernel interface definition.
   // -------------------------------------------
   
   [[intel::kernel_args_restrict]]
   void operator()() const {
      // ----------------------------------------
      //       Kernel code implementation.
      // ----------------------------------------
   }
};

boonbengt_altera · Answer

Hi @JSYOO,Greetings, just checking in to see if there is any further doubts in regards to this matter.Hope your doubts have been clarified.Best WishesBB

boonbengt_altera · Answer

Hi @JSYOO,Greetings, as we do not receive any further clarification/updates on the matter, hence would assume challenge are overcome. Please login to ‘ https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.Best WishesBB

Forum Discussion

II is an approximation due to the following stallable instructions

3 Replies

Recent Discussions

Agilex 7 I-Series "aocl diagnose acl0" error following OFS

AI Suite System Throughput Issue

HLS Compiler 24.1 error - aocl-clang.exe - dll entry point not found

How Do I get the License for HLS?

Deprecation Notice for FPGA Support Package for oneAPI DPC++/C++. What is the alternative?