ContributionsMost RecentMost LikesSolutionsRe: Can I be added to oneAPI priority support and FPGA tools access?In that case, can I get the privilege of accessing the FPGA tools? As mentioned above, I will be responsible for the installation and licensing.Re: Board support package (BSP) for DE10-pro I meant to add this OFS website screenshot Board support package (BSP) for DE10-pro Hello, I want to use oneAPI on the DE10-pro board, which uses Stratix 10 FPGA. I have previously asked a question about it on the forum and got a response that I can use a platform designer or OFS to build BSP. I will be using this FPGA for a while and want to use the host code to build and use BSP. On the getting started page of the OFS website, the link to Stratix 10 shows: Stratix® 10 FPGA Intel® FPGA PAC D5005 Since the DE10-pro board is also a "Stratix® 10 PCIe Attach OFS", I was wondering if I could follow this guide to build BSP for this board. Thank you Can I be added to oneAPI priority support and FPGA tools access? Hello, I am a graduate student and wish to be added to oneAPI priority support and FPGA tools access. Our lab started developing using Intel tools last year, and my supervisor has access to oneAPI priority support and FPGA tools. However, I am trying out the Intel tools, but I don't have direct access to priority support. Also, tool downloads and licensing sometimes take weeks due to the necessity of going through my supervisor. In this situation, I would like to know if I can get access to the tools, too. Thank you II is an approximation due to the following stallable instructions Hi I am analyzing the report from oneAPI FPGA report generation. I am currently facing Compiler failed to schedule this loop with smaller II due to memory dependency So I came back to a simple vector add example provided from github oneAPI C++_SYCL_FPGA, but I am still seeing the same errors Another message that concerns me is II is an approximation due to the following stallable instructions: Load Operation (handler.hpp: 1531 > vector_add.cpp: 19) Load Operation (handler.hpp: 1531 > vector_add.cpp: 20) Store Operation (handler.hpp: 1531 > vector_add.cpp: 22) In my application, I also need to load data from global memory, compute and store back to global memory. Can you suggest me a way to resolve this issue? the source code of vector_add.cpp: #include <iostream> // oneAPI headers #include <sycl/ext/intel/fpga_extensions.hpp> #include <sycl/sycl.hpp> // Forward declare the kernel name in the global scope. This is an FPGA best // practice that reduces name mangling in the optimization reports. class VectorAddID; struct VectorAdd { int *const vec_a_in; int *const vec_b_in; int *const vec_c_out; int len; void operator()() const { for (int idx = 0; idx < len; idx++) { int a_val = vec_a_in[idx]; int b_val = vec_b_in[idx]; int sum = a_val + b_val; vec_c_out[idx] = sum; } } }; constexpr int kVectSize = 256; int main() { bool passed = true; try { // Use compile-time macros to select either: // - the FPGA emulator device (CPU emulation of the FPGA) // - the FPGA device (a real FPGA) // - the simulator device #if FPGA_SIMULATOR auto selector = sycl::ext::intel::fpga_simulator_selector_v; #elif FPGA_HARDWARE auto selector = sycl::ext::intel::fpga_selector_v; #else // #if FPGA_EMULATOR auto selector = sycl::ext::intel::fpga_emulator_selector_v; #endif // create the device queue sycl::queue q(selector); auto device = q.get_device(); std::cout << "Running on device: " << device.get_info<sycl::info::device::name>().c_str() << std::endl; if (!device.has(sycl::aspect::usm_host_allocations)) { std::terminate(); } // declare arrays and fill them // allocate in shared memory so the kernel can see them int *vec_a = sycl::malloc_shared<int>(kVectSize, q); int *vec_b = sycl::malloc_shared<int>(kVectSize, q); int *vec_c = sycl::malloc_shared<int>(kVectSize, q); for (int i = 0; i < kVectSize; i++) { vec_a[i] = i; vec_b[i] = (kVectSize - i); } std::cout << "add two vectors of size " << kVectSize << std::endl; q.single_task<VectorAddID>(VectorAdd{vec_a, vec_b, vec_c, kVectSize}) .wait(); // verify that vec_c is correct for (int i = 0; i < kVectSize; i++) { int expected = vec_a[i] + vec_b[i]; if (vec_c[i] != expected) { std::cout << "idx=" << i << ": result " << vec_c[i] << ", expected (" << expected << ") A=" << vec_a[i] << " + B=" << vec_b[i] << std::endl; passed = false; } } std::cout << (passed ? "PASSED" : "FAILED") << std::endl; sycl::free(vec_a, q); sycl::free(vec_b, q); sycl::free(vec_c, q); } catch (sycl::exception const &e) { // Catches exceptions in the host code. std::cerr << "Caught a SYCL host exception:\n" << e.what() << "\n"; // Most likely the runtime couldn't find FPGA hardware! if (e.code().value() == CL_DEVICE_NOT_FOUND) { std::cerr << "If you are targeting an FPGA, please ensure that your " "system has a correctly configured FPGA board.\n"; std::cerr << "Run sys_check in the oneAPI root directory to verify.\n"; std::cerr << "If you are targeting the FPGA emulator, compile with " "-DFPGA_EMULATOR.\n"; } std::terminate(); } return passed ? EXIT_SUCCESS : EXIT_FAILURE; } the full message from loop analysis details: VectorAddID.B1: Hyper-Optimized loop structure: disabled. Memory dependency Compiler failed to schedule this loop with smaller II due to memory dependency: From: Load Operation ( handler.hpp: 1531 > vector_add.cpp: 19 ) To: Store Operation ( handler.hpp: 1531 > vector_add.cpp: 22 ) Compiler failed to schedule this loop with smaller II due to memory dependency: From: Load Operation ( handler.hpp: 1531 > vector_add.cpp: 20 ) To: Store Operation ( handler.hpp: 1531 > vector_add.cpp: 22 ) Most critical loop feedback path during scheduling: 70.00 clock cycles Load Operation ( handler.hpp: 1531 > vector_add.cpp: 19 ) 10.00 clock cycles Store Operation ( handler.hpp: 1531 > vector_add.cpp: 22 ) 1.16 clock cycle 32-bit Integer Add Operation ( handler.hpp: 1531 > vector_add.cpp: 21 ) II is an approximation due to the following stallable instructions: Load Operation ( handler.hpp: 1531 > vector_add.cpp: 19 ) Load Operation ( handler.hpp: 1531 > vector_add.cpp: 20 ) Store Operation ( handler.hpp: 1531 > vector_add.cpp: 22 ) Maximum concurrent iterations: Capacity of loop Use the Loop Analysis viewer to estimate capacity See FPGA Handbook : Loops for more information Re: DE-10 pro (stratix 10) oneAPI BSP thank you very much I'll probably start off with option 1 Is there any good getting started guide I can follow to run an example on de-10 pro board? Re: DE-10 pro (stratix 10) oneAPI BSP At the moment, the DE-10 pro is the only board that I have and I'll have to try it without an oneAPI BSP. I think I have the option of trying 1. Platform designer manual IP integration (https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/Tools/platform_designer) 2. OFS to build BSP Could you suggest me more stable version between these two? Re: DE-10 pro (stratix 10) oneAPI BSP Thank you, I will reach out to Terasic for support. In case there are no oneAPI BSP available, can you guide me with steps needed for me to develop this board using oneAPI? DE-10 pro (stratix 10) oneAPI BSP I currently have a DE-10 pro and wish to use oneAPI. I could fine the BSP for opencl, but not one for oneAPI. Is there a BSP for oneAPI? Thanks Hardware / OS recommendation for FPGA development using oneAPI Hi, I wanted to get some recommendations on hardware for FPGA development flow using oneAPI (emulation, report generation, modelsim/questasim simulation and FPGA programming). It will be used by only several users. I wanted to ask couple of questions before I buy a desktop or a workstation. 1. Do you suggest I-9 processor or Xeon processor? 2. Does the tools use GPU much? We won't be implementing anything in GPU, but wanted to check if having a good GPU will improve the workflow or not. 3. OS suggestions for a clean environment setup will be nice 4. Any other suggestions on hardware / software? Thanks, Junsang Yoo