HAL Kernel Version Mismatch Error During FPGA Emulation with vector-add Sample
Hello Intel FPGA team, I'm currently working on the vector-add example from the official oneAPI-samples repository, specifically from: DirectProgramming/DPC++/DenseLinearAlgebra/vector-add I’m encountering the following runtime error when I attempt to run the emulation build: Error output: ./vector-add-buffers.fpga_emu HAL Kern: Version mismatch! Expected 0xa0c00001 but read 0x4130 Hardware version ID differs from version expected by software. Either: a) Ensure your compiled design was generated by the same ACL build currently in use, OR b) The host can not communicate with the compiled kernel. vector-add-buffers.fpga_emu: /nfs/sc/disks/swip_hld_1/ops/SC/hld/nightly/2022.1/96.2/l64/work/acl/acl/source/57c9d2bcb46afcf445b5da2406c0e6d85be93ef3/src/acl_kernel_if.cpp:733: int acl_kernel_if_init(acl_kernel_if*, acl_bsp_io, acl_system_def_t*): Assertion `0' failed. make: *** [Makefile.fpga:35: run_emu] Error 1 Environment details: Board: DE10-Agilex BSP Path: /opt/intel/oneapi/intelfpgadpcpp/2021.4.0/board/de10_agilex oneAPI version: Installed multiple versions. Active: 2022.0.2 dpcpp path: /opt/intel/oneapi/compiler/2022.0.2/linux/bin/dpcpp OS: Ubuntu (detected as Rocky Linux during install attempts) What I have tried: Verified the AOCL_BOARD_PACKAGE_ROOT is correctly set. Recompiled the design using make clean && make fpga_emu. Ran aoc -list-board-packages to confirm the installed board. Ensured Quartus, BSP, and compiler are aligned. Despite that, I still encounter the HAL version mismatch. Request: Could someone guide me on how to: Resolve this version mismatch issue? Confirm the correct environment and runtime versions are in sync? Completely clean older/duplicate oneAPI installations if that’s the root cause? @intel @OneAPI @fpga @agilex7 @de10 @intel65.1KViews0likes3CommentsSystem console giving up
Hi, I am using intel hls compiler and generated a design to add 4 numbers. I created a .cpp file emulated it and verified the output, I created the ip file using the hls commands and have integrated it in the platform designer along with jtag to avalon master bridge ip. While testing on the hardware through system console, i have established the jtag path and while trying to write values to the registers, it is showing the below error: master_write_32: This transaction did not complete in 60 seconds. System Console is giving up. while executing "master_write_32 $master_service_path 0x34 4" (file "load_vals.tcl" line 6) invoked from within "source load_vals.tcl". Below iam attaching the .qar file , screenshot of the system console window and also attaching the csr.h file consisting the register addresses and am also sharing the c++ code for which i generated the ip using hls compiler.Solved4.2KViews0likes11CommentsModular approach for the NIOS ii processor intigration with main FPGA file
I am currently working on integrating the Nios II processor with the main VHDL file in Quartus Prime. So far, I’ve successfully implemented PWM signal generation by assigning a constant angle using the Nios II processor. My next goal is to make this system modular, so it can support N angles instead of just one. In my previous implementation, I used a single PIO (Parallel I/O) and assigned a base address to it. Now, I’d like to know: Is it possible to automatically assign addresses for multiple angles (i.e., for multiple PIOs corresponding to each angle)? If so, what’s the best approach to manage or generate these addresses dynamically in a modular way? I’ve also attached the files from my previous implementation for reference.Solved2.7KViews0likes4CommentsThe compilation time is too long for Intel FPGA OpenCL
I am trying to compile a HLS project with Intel SDK for OpenCL 20.3 on DE10 PRO. This project used to take 5~6 hours to compile on Intel SDK for OpenCL 19.4 on Arria10, but it has taken more than 17 hours of compilation now. The resource consumption in Linux is shown below. It seems that the resource used is not that much. The latest file generated in compilation is shown below. It seems that routing has been finished successfully. But after top.fit.route.rpt generated, 5 hours has been consumed with no refresh in any file. I want to know if it is usual to take such a long time in compilation and how can I reduce the compilation time in this flow.Solved2.4KViews0likes4CommentsHLS Avalon interface data width implementable only with 2^N numbers
Hi, this is about a second issue we have while migrating an IP core from HLS workflow to new sycl HLS tools (first is here, might be related but this one stand s for its own as well) . The HLS defined core has an Avalon streaming interface definition of using InputStream = ihc::stream_in<ac_int<96, false>;, ihc::bitsPerSymbol<16>, ihc::usesPackets<true>>; InputStream g_in_stream; // global After HLS implementation the component tcl script defines a 96 width bus as expected: #### Streaming interface for g_in_stream add_interface g_in_stream avalon_streaming sink ... set_interface_property g_in_stream dataBitsPerSymbol 16 set_interface_property g_in_stream symbolsPerBeat 6 set_interface_property g_in_stream firstSymbolInHighOrderBits 0 set_interface_assignment g_in_stream hls.cosim.name {@g_in_stream} add_interface_port g_in_stream g_in_stream_data data input 96 ... The migrated sycl HLS core has this aquivalent interface definition // StreamingBeat struct enables sideband signals in Avalon streaming interface using StreamingBeatT = sycl::ext::intel::experimental::StreamingBeat< ac_int<96, false>, // type carried over this Avalon streaming interface's data signal true, // enable startofpacket and endofpacket signals false>; // disable the empty signal // Pipe properties using PipePropertiesT = decltype(sycl::ext::oneapi::experimental::properties( sycl::ext::intel::experimental::ready_latency<0>, sycl::ext::intel::experimental::bits_per_symbol<16>, sycl::ext::intel::experimental::uses_valid<true>, sycl::ext::intel::experimental::first_symbol_in_high_order_bits<true>, sycl::ext::intel::experimental::protocol_avalon_streaming_uses_ready)); // Image streams using InPixelPipe = sycl::ext::intel::experimental::pipe< InStream, // An identifier for the pipe StreamingBeatT, // The type of data in the pipe 0, // The capacity of the pipe PipePropertiesT // Customizable pipe properties >; The implementation results in this error: Compiler Error: The data type carried by _InStream exceeds the bits per symbol. You can either enable the sideband signal 'use empty' or increase the bits per symbol. Only power of 2 numbers for equal number couples of streamingBeatT data width and PipePropertiesT:bits_per_symbol are implementable. For e.g. 32bit the component tcl scripts is #### Channel (Avalon_ST) interface avm_channel_id_acl_c_InStream_pipe_channel_read add_interface avm_channel_id_acl_c_InStream_pipe_channel_read avalon_streaming sink ... set_interface_property avm_channel_id_acl_c_InStream_pipe_channel_read symbolsPerBeat 1 set_interface_property avm_channel_id_acl_c_InStream_pipe_channel_read dataBitsPerSymbol 32 ... Attached is a reproducer based on streaming data interface example. Is there any new restriction for Avalon interfaces in the sycl HLS style? How to get the initial interface definition implemented? Thanks for any suggestion! oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308), Ubuntu 22.04.4 LTS2.1KViews0likes9CommentsOpenCL FPGA: actual results differ from emulation results
Platform: DE10-nano soc, Intel FPGA SDK for OpenCL 18.1 I am designing a matrix multiplication kernel similar to this one: https://cnugteren.github.io/tutorial/pages/page8.html It uses 3D work items to basically multiply many sets of two-matrix pairs and output the results. The emulation passes, while the actual design on-chip didn't. When running on FPGA, only the first few digits match with correct results. I am thinking maybe it has something to do with the way the emulator emulates multiple work items. But I add barriers whenever I load values to local memory. Could anyone provide some insights on the difference between multiple work items implementation in emulation and actual design? #include "config.h" uint8_t gf_mu_x86(uint8_t a, uint8_t b) { uint8_t p = 0; /* the product of the multiplication */ #pragma unroll for (int i=0;i<8;i++){ // if (!(a && b)){ // break; // } if (b & 1) /* if b is odd, then add the corresponding a to p (final product = sum of all a's corresponding to odd b's) */ p ^= a; /* since we're in GF(2^m), addition is an XOR */ if (a & 0x80) /* GF modulo: if a >= 128, then it will overflow when shifted left, so reduce */ a = (a << 1) ^ 0x11D; /* XOR with the primitive polynomial x^8 + x^4 + x^3 + x + 1 (0b1_0001_1011) – you can change it but it must be irreducible */ else a <<= 1; /* equivalent to a*2 */ b >>= 1; /* equivalent to b // 2 */ } return p; } int address_interpretor(int x, int y, int offset, __global const uint8_t* restrict sample_idx){ // use x to find index of required packet (file space) in sample_idx uint8_t file_pkt_idx = sample_idx[offset+x]; // calculate idx of required data in file space return file_pkt_idx*PKT_SIZE + y; } // Use 2D register blocking (further increase in work per thread) __kernel // __attribute__((num_compute_units(CMP_UNIT))) // __attribute__((max_work_group_size(256))) __attribute__((reqd_work_group_size(TSM/WPTM, TSN/WPTN, 1))) // 8, 1, 1 void myGEMM6( __global const uint8_t* restrict A, __global const uint8_t* restrict B, __global uint8_t* restrict C, __global const uint8_t* restrict DEGREE_, __global const uint8_t* restrict sample_idx // cached ) { // Thread identifiers const int tidm = get_local_id(0); // Local row ID (max: TSM/WPTM == RTSM) const int tidn = get_local_id(1); // Local col ID (max: TSN/WPTN == RTSN) const int offsetM = TSM*get_group_id(0); // Work-group offset const int offsetN = TSN*get_group_id(1); // Work-group offset const int batch_id = get_global_id(2); // max: N_BATCH // Local memory to fit a tile of A and B __local uint8_t Asub[TSK][TSM]; __local uint8_t Bsub[TSN][TSK+2]; __local uint8_t degrees[MAX_NUM_BATCH]; // Allocate register space uint8_t Areg; uint8_t Breg[WPTN]; uint8_t acc[WPTM][WPTN]; int deg_offset = 0; uint8_t my_deg; // Initialise the accumulation registers #pragma unroll for (int wm=0; wm<WPTM; wm++) { #pragma unroll for (int wn=0; wn<WPTN; wn++) { acc[wm][wn] = 0; } } // load degrees and calculate offsets if(tidm == 0 && tidn == 0){ #pragma unroll for(int i=0;i<MAX_NUM_BATCH;i++){ degrees[i] = DEGREE_[i]; } } barrier(CLK_LOCAL_MEM_FENCE); for(int i=0;i<batch_id;i++){ deg_offset += degrees[i]; } my_deg = degrees[batch_id]; // Loop over all tiles const int numTiles = my_deg/TSK; barrier(CLK_LOCAL_MEM_FENCE); for(int t=0;t<numTiles;t++){ // Load one tile of A and B into local memory // #pragma unroll for (int la=0; la<LPTA; la++) { int tid = tidn*RTSM + tidm; int id = la*RTSN*RTSM + tid; int row = MOD2(id,TSM); int col = DIV2(id,TSM); // float row_ = MOD2(id,TSM); // float col_ = DIV2(id,TSM); // printf("%f,%f\n",row_,col_); int tiledIndex = TSK*t + col; int A_vec = address_interpretor(tiledIndex, offsetM + row, deg_offset,sample_idx); // Asub[col][row] = A[tiledIndex*PKT_SIZE + offsetM + row]; Asub[col][row] = A[A_vec]; Bsub[row][col]= B[tiledIndex*BATCH_SIZE + offsetN + row + deg_offset*BATCH_SIZE]; } // Synchronise to make sure the tile is loaded barrier(CLK_LOCAL_MEM_FENCE); // Loop over the values of a single tile // #pragma unroll for (int k=0; k<TSK; k++) { // Cache the values of Bsub in registers #pragma unroll for (int wn=0; wn<WPTN; wn++) { int col = tidn + wn*RTSN; Breg[wn] = Bsub[col][k]; } // Perform the computation #pragma unroll for (int wm=0; wm<WPTM; wm++) { int row = tidm + wm*RTSM; Areg = Asub[k][row]; #pragma unroll for (int wn=0; wn<WPTN; wn++) { acc[wm][wn] ^= gf_mu_x86(Areg , Breg[wn]); } } } // Synchronise before loading the next tile barrier(CLK_LOCAL_MEM_FENCE); } // Store the final results in C // #pragma unroll for (int wm=0; wm<WPTM; wm++) { int globalRow = offsetM + tidm + wm*RTSM; #pragma unroll for (int wn=0; wn<WPTN; wn++) { int globalCol = offsetN + tidn + wn*RTSN; C[globalCol*PKT_SIZE + globalRow + batch_id*PKT_SIZE*BATCH_SIZE] = acc[wm][wn]; } } }Solved2KViews0likes2CommentsAssertion failed in "hdl.cpp" when compiling HLS design
Good day! I'm working with Quartus Prime Pro 24.2 and its corresponding version of HLS Compiler. I get the following error message shortly after launching compilation, with an Agilex 7 board as target: Assertion failed: size >= 1, file hdl.cpp, line 201 HLS System Integration FAILED. It seems like this hdl.cpp file is nowhere to be found in my disk. I cannot share the design as it is, since it includes a confidential module, but it might be relevant is that the error appears since I started testing it with mm_host interfaces. In case it might be relevant, here are the interface types I am using: typedef ihc::mm_host<cfixed_t, ihc::dwidth<1024>, ihc::awidth<10>, ihc::latency<0>, ihc::waitrequest<true>, ihc::aspace<1>> mm_host_t1; typedef ihc::mm_host<cfixed_t, ihc::dwidth<1024>, ihc::awidth<2>, ihc::latency<0>, ihc::waitrequest<true>, ihc::aspace<2>> mm_host_t2; typedef ihc::mm_host<cfixed_t, ihc::dwidth<512>, ihc::awidth<4>, ihc::latency<0>, ihc::waitrequest<true>, ihc::aspace<3>> mm_host_t3; I thought to ask in case you could clarify how to check what this assertion refers to. Let me know if you would require more details. Regards, Noah1.9KViews0likes7CommentsInter-Kernel and Kernel-Host Pipes in one design
Hi, we are transfering a HLS IP core design to the new sycl HLS tool flow. During this migration several issues came up. We have one of them isolated here and generated a reproducer (please see attached file). The issue comes while mixing Inter-Kernel and Kernel-Host Pipes with DIFFERENT data types. The compiler generates EMU when datatype is the same but crashes (icpx: error: fpga compiler command failed with exit code 245) when datatype is different. Please notice that this is only for EMU. SIM and FPGA targets run through without issues and gererate expected results. The reproducer is based on Pipes Sample and modified to add Host Pipes. Any suggestion how to get different data types emulated? Thanks! oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308), Ubuntu 22.04.4 LTSSolved1.9KViews0likes5CommentsError while linking host code device code in sycl with icpx
Hello, Lately I have been trying to compile and link a sycl kernel which has the attribute reqd_work_group_size for FPGA hardware with the oneAPI toolkits, specifically the icpx compiler. Basically, I modified the fast recompile tutorial of oneAPI samples (https://github.com/oneapi-src/oneAPI-samples/tree/main/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile ) to make the sycl structures 2D dimensional and make the kernel to have a nd_range as well as the reqd_work_group_size kernel attribute. Compiling and linking the kernel into a FPGA image works well, but when i link this image with the host code it crashes showing the next error: [100%] Generating fast_recompile.fpga /opt/intel/oneapi/compiler/2024.1/bin/icpx -fintelfpga -qactypes host.o kernel_image.a -o fast_recompile.fpga -I/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/../../../include libc++abi: terminating due to uncaught exception of type std::bad_alloc: std::bad_alloc #0 0x000056039960bd73 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x3afd73) #1 0x000056039960a232 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x3ae232) #2 0x000056039960c504 SignalHandler(int) Signals.cpp:0:0 #3 0x00007fc149201420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420) #4 0x00007fc14903e00b raise /build/glibc-LcI20x/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1 #5 0x00007fc14901d859 abort /build/glibc-LcI20x/glibc-2.31/stdlib/abort.c:81:7 #6 0x0000560399715896 (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x4b9896) #7 0x00005603996fe5db demangling_terminate_handler() cxa_default_handlers.cpp:0:0 #8 0x0000560399715553 std::__terminate(void (*)()) cxa_handlers.cpp:0:0 #9 0x0000560399716f36 (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x4baf36) #10 0x0000560399716ecf (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x4baecf) #11 0x0000560399716c28 operator new(unsigned long) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x4bac28) #12 0x00005603995e6ca8 llvm::util::PropertyValue::PropertyValue(unsigned char const*, unsigned long) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x38aca8) #13 0x00005603994758dd SymPropReader::getPropRegistry() (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x2198dd) #14 0x000056039946cc22 (anonymous namespace)::BinaryWrapper::createBinDesc(OffloadKind, llvm::SmallVector<std::__1::unique_ptr<(anonymous namespace)::BinaryWrapper::Image, std::__1::default_delete<(anonymous namespace)::BinaryWrapper::Image>>, 4u>&) ClangOffloadWrapper.cpp:0:0 #15 0x00005603994654de main (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x2094de) #16 0x00007fc14901f083 __libc_start_main /build/glibc-LcI20x/glibc-2.31/csu/../csu/libc-start.c:342:3 #17 0x0000560399461529 _start (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x205529) icpx: error: unable to execute command: Aborted (core dumped) icpx: error: clang-offload-wrapper command failed due to signal (use -v to see invocation) Intel(R) oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/intel/oneapi/compiler/2024.1/bin/compiler Configuration file: /opt/intel/oneapi/compiler/2024.1/bin/compiler/../icpx.cfg icpx: note: diagnostic msg: Error generating preprocessed source(s). Is there a way to compile this kernel like in the example, separating host and device code? Thanks I leave down below the modified code of the fast_recompile files, as well as all the compilation messages printed. host.cpp code //============================================================== // Copyright Intel Corporation // // SPDX-License-Identifier: MIT // ============================================================= #include <iostream> #include <vector> #include <sycl/sycl.hpp> #include <sycl/ext/intel/fpga_extensions.hpp> #include "exception_handler.hpp" // This code sample demonstrates how to split the host and FPGA kernel code into // separate compilation units so that they can be separately recompiled. // Consult the README for a detailed discussion. // - host.cpp (this file) contains exclusively code that executes on the host. // - kernel.cpp contains almost exclusively code that executes on the device. // - kernel.hpp contains only the forward declaration of a function containing // the device code. #include "kernel.hpp" using namespace sycl; // the tolerance used in floating point comparisons constexpr float kTol = 0.001; // the array size of vectors a, b and c constexpr size_t kArraySize = 64; int main() { std::vector<float> vec_a(kArraySize*kArraySize); std::vector<float> vec_b(kArraySize*kArraySize); std::vector<float> vec_r(kArraySize*kArraySize); // Fill vectors a and b with random float values for (size_t i = 0; i < kArraySize; i++) { vec_a[i] = rand() / (float)RAND_MAX; vec_b[i] = rand() / (float)RAND_MAX; } // Select either the FPGA emulator, FPGA simulator or FPGA device #if FPGA_SIMULATOR auto selector = sycl::ext::intel::fpga_simulator_selector_v; #elif FPGA_HARDWARE auto selector = sycl::ext::intel::fpga_selector_v; #else // #if FPGA_EMULATOR auto selector = sycl::ext::intel::fpga_emulator_selector_v; #endif try { // Create a queue bound to the chosen device. // If the device is unavailable, a SYCL runtime exception is thrown. queue q(selector, fpga_tools::exception_handler); auto device = q.get_device(); std::cout << "Running on device: " << device.get_info<sycl::info::device::name>().c_str() << std::endl; // create the device buffers buffer device_a(vec_a.data(), range<2>(kArraySize,kArraySize)); buffer device_b(vec_b.data(), range<2>(kArraySize,kArraySize)); buffer device_r(vec_r.data(), range<2>(kArraySize,kArraySize)); // The definition of this function is in a different compilation unit, // so host and device code can be separately compiled. RunKernel(q, device_a, device_b, device_r, kArraySize, 16); } catch (exception const &e) { // Catches exceptions in the host code std::cerr << "Caught a SYCL host exception:\n" << e.what() << "\n"; // Most likely the runtime couldn't find FPGA hardware! if (e.code().value() == CL_DEVICE_NOT_FOUND) { std::cerr << "If you are targeting an FPGA, please ensure that your " "system has a correctly configured FPGA board.\n"; std::cerr << "Run sys_check in the oneAPI root directory to verify.\n"; std::cerr << "If you are targeting the FPGA emulator, compile with " "-DFPGA_EMULATOR.\n"; } std::terminate(); } // At this point, the device buffers have gone out of scope and the kernel // has been synchronized. Therefore, the output data (vec_r) has been updated // with the results of the kernel and is safely accesible by the host CPU. // Test the results size_t correct = 0; for (size_t i = 0; i < kArraySize*kArraySize; i++) { float tmp = vec_a[i] + vec_b[i] - vec_r[i]; if (tmp * tmp < kTol * kTol) { correct++; } } // Summarize results if (correct == kArraySize*kArraySize) { std::cout << "PASSED: results are correct\n"; } else { std::cout << "FAILED: results are incorrect\n"; } return !(correct == kArraySize); } kernel.hpp code //============================================================== // Copyright Intel Corporation // // SPDX-License-Identifier: MIT // ============================================================= #include <sycl/sycl.hpp> using namespace sycl; void RunKernel(queue& q, buffer<float,2>& buf_a, buffer<float,2>& buf_b, buffer<float,2>& buf_r, size_t size,size_t wgs); kernel.cpp code //============================================================== // Copyright Intel Corporation // // SPDX-License-Identifier: MIT // ============================================================= #include <sycl/ext/intel/fpga_extensions.hpp> #include "kernel.hpp" // This file contains 'almost' exclusively device code. The single-source SYCL // code has been refactored between host.cpp and kernel.cpp to separate host and // device code to the extent that the language permits. // // Note that ANY change in either this file or in kernel.hpp will be detected // by the build system as a difference in the dependencies of device.o, // triggering a full recompilation of the device code. // // This is true even of trivial changes, e.g. tweaking the function definition // or the names of variables like 'q' or 'h', EVEN THOUGH these are not truly // "device code". // Forward declare the kernel names in the global scope. This FPGA best practice // reduces compiler name mangling in the optimization reports. class VectorAdd; void RunKernel(queue& q, buffer<float,2>& buf_a, buffer<float,2>& buf_b, buffer<float,2>& buf_r, size_t size, size_t wgs){ // submit the kernel q.submit([&](handler &h) { // Data accessors accessor a(buf_a, h, read_only); accessor b(buf_b, h, read_only); accessor r(buf_r, h, write_only, no_init); // Kernel executes with pipeline parallelism on the FPGA. // Use kernel_args_restrict to specify that a, b, and r do not alias. h.parallel_for<VectorAdd>(nd_range(range(size,size),range(wgs,wgs)),[=](nd_item<2> item) [[intel::kernel_args_restrict, sycl::reqd_work_group_size(16,16)]] { size_t i = item.get_global_id(0); size_t j = item.get_global_id(1); r[{i,j}] = a[{i,j}] + b[{i,j}]; }); }); } compilation messages /usr/bin/cmake -S/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile -B/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 --check-build-system CMakeFiles/Makefile.cmake 0 make -f CMakeFiles/Makefile2 fpga make[1]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' /usr/bin/cmake -S/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile -B/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 --check-build-system CMakeFiles/Makefile.cmake 0 /usr/bin/cmake -E cmake_progress_start /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2/CMakeFiles 6 make -f CMakeFiles/Makefile2 CMakeFiles/fpga.dir/all make[2]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' make -f CMakeFiles/displayHostCompileCommand.dir/build.make CMakeFiles/displayHostCompileCommand.dir/depend make[3]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' cd /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2/CMakeFiles/displayHostCompileCommand.dir/DependInfo.cmake --color= Scanning dependencies of target displayHostCompileCommand make[3]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' make -f CMakeFiles/displayHostCompileCommand.dir/build.make CMakeFiles/displayHostCompileCommand.dir/build make[3]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' [ 16%] To run the host code compile manually: /opt/intel/oneapi/compiler/2024.1/bin/icpx -fintelfpga -Wall -qactypes -DFPGA_HARDWARE -c /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/src/host.cpp -o host.o -I/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/../../../include /usr/bin/cmake -E cmake_echo_color --cyan make[3]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' [ 16%] Built target displayHostCompileCommand make -f CMakeFiles/displayFPGALinkCompileCommand.dir/build.make CMakeFiles/displayFPGALinkCompileCommand.dir/depend make[3]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' cd /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2/CMakeFiles/displayFPGALinkCompileCommand.dir/DependInfo.cmake --color= Scanning dependencies of target displayFPGALinkCompileCommand make[3]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' make -f CMakeFiles/displayFPGALinkCompileCommand.dir/build.make CMakeFiles/displayFPGALinkCompileCommand.dir/build make[3]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' [ 33%] To run the FPGA link compile manually: /opt/intel/oneapi/compiler/2024.1/bin/icpx -fintelfpga -qactypes host.o kernel_image.a -o fast_recompile.fpga -I/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/../../../include /usr/bin/cmake -E cmake_echo_color --cyan make[3]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' [ 33%] Built target displayFPGALinkCompileCommand make -f CMakeFiles/displayDeviceCompileCommand.dir/build.make CMakeFiles/displayDeviceCompileCommand.dir/depend make[3]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' cd /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2/CMakeFiles/displayDeviceCompileCommand.dir/DependInfo.cmake --color= Scanning dependencies of target displayDeviceCompileCommand make[3]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' make -f CMakeFiles/displayDeviceCompileCommand.dir/build.make CMakeFiles/displayDeviceCompileCommand.dir/build make[3]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' [ 50%] To run the device code compile manually: /opt/intel/oneapi/compiler/2024.1/bin/icpx -fintelfpga -Wall -qactypes -DFPGA_HARDWARE -fintelfpga -qactypes -Xshardware -Xstarget=de10_agilex:B2E2_8GBx4 -Xsclock=50MHz -Xsseed=2 -reuse-exe=/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2/fast_recompile.fpga -fsycl-link=image /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/src/kernel.cpp -o kernel_image.a -I/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/../../../include /usr/bin/cmake -E cmake_echo_color --cyan make[3]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' [ 50%] Built target displayDeviceCompileCommand make -f CMakeFiles/fpga.dir/build.make CMakeFiles/fpga.dir/depend make[3]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' cd /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2 /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2/CMakeFiles/fpga.dir/DependInfo.cmake --color= Scanning dependencies of target fpga make[3]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' make -f CMakeFiles/fpga.dir/build.make CMakeFiles/fpga.dir/build make[3]: Entering directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' [ 66%] Generating host.o /opt/intel/oneapi/compiler/2024.1/bin/icpx -fintelfpga -Wall -qactypes -DFPGA_HARDWARE -c /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/src/host.cpp -o host.o -I/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/../../../include [ 83%] Generating kernel_image.a /opt/intel/oneapi/compiler/2024.1/bin/icpx -fintelfpga -Wall -qactypes -DFPGA_HARDWARE -fintelfpga -qactypes -Xshardware -Xstarget=de10_agilex:B2E2_8GBx4 -Xsclock=50MHz\ -Xsseed=2 -reuse-exe=/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2/fast_recompile.fpga -fsycl-link=image /home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/src/kernel.cpp -o kernel_image.a -I/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/../../../include warning: -reuse-exe file '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2/fast_recompile.fpga' not found; ignored aoc: Compiling for FPGA. This process may take several hours to complete. Prior to performing this compile, be sure to check the reports to ensure the design will meet your performance targets. If the reports indicate performance targets are not being met, code edits may be required. Please refer to the oneAPI FPGA Optimization Guide for information on performance tuning applications for FPGAs. [100%] Generating fast_recompile.fpga /opt/intel/oneapi/compiler/2024.1/bin/icpx -fintelfpga -qactypes host.o kernel_image.a -o fast_recompile.fpga -I/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/../../../include libc++abi: terminating due to uncaught exception of type std::bad_alloc: std::bad_alloc #0 0x000056039960bd73 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x3afd73) #1 0x000056039960a232 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x3ae232) #2 0x000056039960c504 SignalHandler(int) Signals.cpp:0:0 #3 0x00007fc149201420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420) #4 0x00007fc14903e00b raise /build/glibc-LcI20x/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1 #5 0x00007fc14901d859 abort /build/glibc-LcI20x/glibc-2.31/stdlib/abort.c:81:7 #6 0x0000560399715896 (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x4b9896) #7 0x00005603996fe5db demangling_terminate_handler() cxa_default_handlers.cpp:0:0 #8 0x0000560399715553 std::__terminate(void (*)()) cxa_handlers.cpp:0:0 #9 0x0000560399716f36 (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x4baf36) #10 0x0000560399716ecf (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x4baecf) #11 0x0000560399716c28 operator new(unsigned long) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x4bac28) #12 0x00005603995e6ca8 llvm::util::PropertyValue::PropertyValue(unsigned char const*, unsigned long) (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x38aca8) #13 0x00005603994758dd SymPropReader::getPropRegistry() (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x2198dd) #14 0x000056039946cc22 (anonymous namespace)::BinaryWrapper::createBinDesc(OffloadKind, llvm::SmallVector<std::__1::unique_ptr<(anonymous namespace)::BinaryWrapper::Image, std::__1::default_delete<(anonymous namespace)::BinaryWrapper::Image>>, 4u>&) ClangOffloadWrapper.cpp:0:0 #15 0x00005603994654de main (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x2094de) #16 0x00007fc14901f083 __libc_start_main /build/glibc-LcI20x/glibc-2.31/csu/../csu/libc-start.c:342:3 #17 0x0000560399461529 _start (/opt/intel/oneapi/compiler/2024.1/bin/compiler/clang-offload-wrapper+0x205529) icpx: error: unable to execute command: Aborted (core dumped) icpx: error: clang-offload-wrapper command failed due to signal (use -v to see invocation) Intel(R) oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/intel/oneapi/compiler/2024.1/bin/compiler Configuration file: /opt/intel/oneapi/compiler/2024.1/bin/compiler/../icpx.cfg icpx: note: diagnostic msg: Error generating preprocessed source(s). make[3]: *** [CMakeFiles/fpga.dir/build.make:65: fast_recompile.fpga] Error 1 make[3]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' make[2]: *** [CMakeFiles/Makefile2:144: CMakeFiles/fpga.dir/all] Error 2 make[2]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' make[1]: *** [CMakeFiles/Makefile2:151: CMakeFiles/fpga.dir/rule] Error 2 make[1]: Leaving directory '/home/diegog/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile/build_reqd_2' make: *** [Makefile:147: fpga] Error 21.6KViews0likes7Comments