Forum Discussion
@JaideepK_Intel Hi, sure!
So, before testing with my project, I would like to successfully compile and run a simple USM code. As I believe, only Stratix FPGA has USM enabled, but Arria doesn't have that feature, right?
So, I enter to a compile node:
qsub -l nodes=1:fpga_compile:ppn=2 -I
Then, I setup the environment variables:
export AOCL_BOARD_PACKAGE_ROOT=/opt/intel/oneapi/intel_s10sx_pac
source /glob/development-tools/versions/fpgasupportstack/d5005/2.0.1/inteldevstack/hld/init_opencl.sh
source /glob/development-tools/versions/fpgasupportstack/d5005/2.0.1/inteldevstack/init_env.sh
export FPGA_BBB_CCI_src=/usr/local/intel-fpga-bbb
export PATH=/glob/intel-python/python2/bin:${PATH}
source /opt/intel/oneapi/setvars.sh
and then I compile my code in two steps:
icpx -fsycl -fintelfpga -Xshardware -c test.cpp -o test.cpp.o
icpx -fsycl -fintelfpga -Xshardware -Xstarget=/opt/intel/oneapi/intel_s10sx_pac:pac_s10_usm test.cpp.o -o test -v
The problem is that in the last command, after 1-2 hours, I get an error saying: "Error: Failed to open quartus_sh_compile.log".
I don't know what am I doing wrong here. Could you help me please? The code is the following.
I really want to thank you for your help all this time. I feel that I am close to being able to compile it successfully.
#include <sycl/sycl.hpp> #include <array> #include <iostream> #include <string> using namespace sycl; size_t array_size = 10000; //************************************ // Vector add in SYCL on device: returns sum in 4th parameter "sum". //************************************ void VectorAdd(queue &q, const int *a, const int *b, int *sum, size_t size) { // Create the range object for the arrays. range<1> num_items{size}; auto e = q.parallel_for(num_items, [=](auto i) { sum[i] = a[i] + b[i]; }); e.wait(); } //************************************ // Initialize the array from 0 to array_size - 1 //************************************ void InitializeArray(int *a, size_t size) { for (size_t i = 0; i < size; i++) a[i] = i; } int main(int argc, char *argv[]) { try { sycl::queue q((sycl::device::get_devices()[std::stoi(argv[1])])); // Print out the device information used for the kernel code. std::cout << "Running on device: " << q.get_device().get_info<info::device::name>() << "\n"; std::cout << "Running on device: " << q.get_device().get_info<info::device::max_compute_units>() << "\n"; std::cout << "Running on device: " << q.get_device().get_info<info::device::max_work_group_size>() << "\n"; std::cout << "Running on device: " << q.get_device().get_info<info::device::global_mem_size>() << "\n"; int *a = malloc_shared<int>(array_size, q); int *b = malloc_shared<int>(array_size, q); int *sum_sequential = malloc_shared<int>(array_size, q); int *sum_parallel = malloc_shared<int>(array_size, q); if ((a == nullptr) || (b == nullptr) || (sum_sequential == nullptr) || (sum_parallel == nullptr)) { if (a != nullptr) free(a, q); if (b != nullptr) free(b, q); if (sum_sequential != nullptr) free(sum_sequential, q); if (sum_parallel != nullptr) free(sum_parallel, q); std::cout << "Shared memory allocation failure.\n"; return -1; } InitializeArray(a, array_size); InitializeArray(b, array_size); for (size_t i = 0; i < array_size; i++) sum_sequential[i] = a[i] + b[i]; VectorAdd(q, a, b, sum_parallel, array_size); for (size_t i = 0; i < array_size; i++) { if (sum_parallel[i] != sum_sequential[i]) { std::cout << "Vector add failed on device.\n"; return -1; } } int indices[]{0, 1, 2, (static_cast<int>(array_size) - 1)}; constexpr size_t indices_size = sizeof(indices) / sizeof(int); for (int i = 0; i < indices_size; i++) { int j = indices[i]; if (i == indices_size - 1) std::cout << "...\n"; std::cout << "[" << j << "]: " << j << " + " << j << " = " << sum_sequential[j] << "\n"; } free(a, q); free(b, q); free(sum_sequential, q); free(sum_parallel, q); } catch (exception const &e) { std::cout << "An exception is caught while adding two vectors.\n"; std::terminate(); } std::cout << "Vector add successfully completed on device.\n"; return 0; }