ContributionsMost RecentMost LikesSolutionsRe: Deadlock while filling pipe for simulation Hi, thanks for having a look at the issues! CPU speed will have influence on the issue generation (RAM size might be too), so for your machine the settings for reproduction might be different. When I set the gap number e.g. 5000 it runs through w/o issues on my machine too. So on you machine you need to set it higher, try 30000. b.t.w. the full error I see is: vsimk: src/hls_cosim_ipc_socket.cpp:133: virtual void IPCSocket::send(const void*, int): Assertion `0 && "send() failed"' failed. # Attempting stack trace sig 6 # Signal caught: signo [0] # vsim_stacktrace.vstf written # Current time Mon Jun 17 16:48:02 2024 # Program = vsim # Id = "2023.2" # Version = "2023.04" # Date = "Apr 11 2023" # Platform = "linux_x86_64" # Signature = 016838926890ae993a152c369dc4c1be # --> START OF USERCODE # 0 0x00007ffff78969fc: 'pthread_kill + 0x000000000000012c' in '/usr/lib/x86_64-linux-gnu/libc.so.6' # 1 0x00007ffff7842476: 'raise + 0x0000000000000016' in '/usr/lib/x86_64-linux-gnu/libc.so.6' # 2 0x00007ffff78287f3: 'abort + 0x00000000000000d3' in '/usr/lib/x86_64-linux-gnu/libc.so.6' # <-- END OF USERCODE # 3 0x00007ffff782871b: '<unknown (@0x7ffff782871b)>' # 4 0x00007ffff7839e96: '<unknown (@0x7ffff7839e96)>' # 5 0x00007ffff2818321: '<unknown (@0x7ffff2818321)>' # --> START OF USERCODE # 6 0x00007ffff2806c1b: 'SimulatorInterface::send_host_channel(void*, void*, bool*, bool*, unsigned int*) + 0x000000000000015b' in '/data1/intel/oneapi/compiler/2024.1/opt/oclfpga/host/linux64/lib/libaoc_cosim_msim.so' # <-- END OF USERCODE # 7 0x00007feff1fc4f1f: '../../ip/mpsim/dpic_Threshold/aoc_sim_component_dpi_controller_10/sim/aoc_sim_stream_sink_dpi_bfm.sv:38' # 8 0x00007feff1fc65ef: '../../ip/mpsim/dpic_Threshold/aoc_sim_component_dpi_controller_10/sim/aoc_sim_stream_sink_dpi_bfm.sv:70' # 9 0x00007feff1fc77ce: '../../ip/mpsim/dpic_Threshold/aoc_sim_component_dpi_controller_10/sim/aoc_sim_stream_sink_dpi_bfm.sv:185' # 10 0x00000000023e53b2: '<unknown (@0x23e53b2)>' # 11 0x00000000004df304: '<unknown (@0x4df304)>' # 12 0x000000000074da63: '<unknown (@0x74da63)>' # 13 0x0000000000ca58ad: '<unknown (@0xca58ad)>' # 14 0x0000000000caabd0: '<unknown (@0xcaabd0)>' # 15 0x0000000000cac54e: '<unknown (@0xcac54e)>' # 16 0x0000000000f9bd2d: '<unknown (@0xf9bd2d)>' # 17 0x000000000287a82d: '<unknown (@0x287a82d)>' # 18 0x000000000287ec86: '<unknown (@0x287ec86)>' # 19 0x0000000002880371: '<unknown (@0x2880371)>' # 20 0x00000000028806d6: '<unknown (@0x28806d6)>' # 21 0x0000000002881df3: '<unknown (@0x2881df3)>' # 22 0x00000000028825b1: '<unknown (@0x28825b1)>' # 23 0x0000000000c6c700: '<unknown (@0xc6c700)>' # 24 0x0000000000c6e315: '<unknown (@0xc6e315)>' # End of Stack Trace o Re: Deadlock while filling pipe for simulation Hi Kevin, you are right, the very simple reproducer did not show the effect. Same for the simple i++ loop as this will end the component without having consumed more data from the input pipe at a later time. The updated reproducer is consuming input data with a large gap. I designed the gap to be large enougth to show that: 1) TB is pushing data to the input pipe until the related tread stalls as the write command is not correctly signalling FULL (the WR STALL print is never shown). 2) Simulation continuosly generates output data, restarts consuming input data after the gap but comes to a dead end as the input pipe is corrupted. 3) The final error comes from vsim but I guess the root cause is in the TB WR handling. WR ... Input: 17000 Output: 51 Diff: 16949 WR ... Input: 18000 Output: 60 Diff: 17940 ^CExiting simulation due to Interrupt vsimk: src/hls_cosim_ipc_socket.cpp:133: virtual void IPCSocket::send(const void*, int): Assertion `0 && "send() failed"' failed. Expected behavioural is that the TB WR thread hold on as long as WR pipe is full but restarts pushing data to it when the kernel has consumed some. Regards, Ric. Re: Deadlock while filling pipe for simulation Hi Kevin, I guess there are race conditions. I sporadically see issues in simulation too: .../build$ ./streaming.fpga_sim Running on device: SimulatorDevice : Multi-process Simulator (aclmsim0) terminate called after throwing an instance of 'sycl::_V1::runtime_error' what(): Enqueue process failed. -59 (PI_ERROR_INVALID_OPERATION) Aborted (core dumped) Please check the host pipe write command in your code. Modify the core to not accept data at it's input for a longer time (backpressure condition). Then the host will throw an error when the pipe is full but the host write is not handling this properly. To reproduce, just modify line 57: 57 // while (!end_of_packet) { 57 while (end_of_packet) { 58 // Read in next pixel As the reproducer is a bit of artificial, I see this issue in our production design too. I hope this helps you to find the root cause of these problems! Regards, Ric. Re: Deadlock while filling pipe for simulation Hi Kevin, thanks for testing and proposing a work around. Unfortunatelly it helps for half of the problem only. The buffer depth limitation can be addressed this way. However, as soon as we had implemented the changes we run into trouble again seeing this error massage sporadically: terminate called after throwing an instance of 'sycl::_V1::runtime_error' what(): Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) Aborted (core dumped) Digging into the issue we found that the non-blocking pipe read causes the error as soon as an empty pipe is accessed to read. To reproduce just generate e.g. 9 write samples and try to read 10. With the print modification from below the output should look like this: Running on device: Intel(R) FPGA Emulation Device Input: 0 9 Input Done! try read: 0 Output: 0 0 success: 1 try read: 1 Output: 1 1 success: 1 try read: 2 Output: 2 2 success: 1 try read: 3 Output: 3 3 success: 1 try read: 4 Output: 4 4 success: 1 try read: 5 Output: 5 5 success: 1 try read: 6 Output: 6 6 success: 1 try read: 7 Output: 7 7 success: 1 try read: 8 Output: 8 8 success: 1 try read: 9 terminate called after throwing an instance of 'sycl::_V1::runtime_error' what(): Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) Aborted (core dumped) while (!read_success){ std::cerr << "try read: " << i << "\t "; out_beat = OutPixelPipe::read(*(out_args->q), read_success); std::cerr << "Output: " << i << "\t " << out_beat.data << "\tsuccess: " << (int)read_success << "\t " << std::endl; } Remember my second post here where some odd behaviour was reported for the non-blocking write command too. Thanks for your help! Ric. Re: HLS Avalon interface data width implementable only with 2^N numbers Hi Aik, there are pending question from above. It would be good to understand if using empty signal is mandatory now (as it was not required for the same interface definition in i++ HLS). The second, related question is about conformability to the Avalon ST definition. Digging deeper in this I see some long time pending changes which may have influence to the above interface definition requirement: To create production designs with this feature it is necessary to have clear and consistent documentation and specifications. Any outlook on what is going on in sycl HLS for this topic? Thanks, Ric. Re: Deadlock while filling pipe for simulation Having modified the write command to be non-blocking I notice that the command still blocks (the if condition is never satisfied): bool success = false; InPixelPipe::write(q, in_beat, success); if (!success){ sleep(1); std::cerr << i << "### STALL ###\n"; } Documentation reads this: Non-blocking writes add a bool argument in both host and device APIs that is passed by reference and returns true in this argument if the write was successful, and false if it was unsuccessful. On the host: // attempt non-blocking write from host to pipe until successful while (!success) MyPipeInstance::write(q, data_element, success); Remember, it's while running simulation. In emulation, the pipe is never stalling. Btw, with i++ HLS we didn't run in this problem, even with very large simulation data sets. Any advice? Thanks! Deadlock while filling pipe for simulation Hi, following best practice as descibed here we run into trouble with stalled writes. When you create a testbench for a oneAPI kernel that you intend to compile as an IP core, write all your data to the host pipe before invoking the kernel. There is a reproducer attached based on streaming data interface example. As reducing the amount of data to process for simulation seems an easy workaround, this will not work if the design is more complex or bigger and the interface width is larger. Both conditions reduce the number of data samples before the buffer access stalls. For our relevant design this means a too small size of simulateable data to get reasonable simulation results. I have following questions: 1) Is this intented behaviour? 2) Is there a parameter to increase the accepted number of samples for simulation pipes? 3) Will this solution solve the simulation issue too? Thanks for any feedback! Ric. oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308), Ubuntu 22.04.4 LTS Re: HLS Avalon interface data width implementable only with 2^N numbers Hi Aik, thank you for providing feedback! And yes, the proposed change make the implementation of non power of 2 data width possible. However I have two concerns: 1) Is this a workaround or intended behaviour? From the Avalon ST specification I got The empty signal is required on all packet interfaces whose data signal carries more than one symbol of data and have a variable length packet format. The size of the empty signal in bits is ceil[log2(<symbols per cycle>)]. From the tcl build script I see set_interface_property avm_channel_id_acl_c_InStream_pipe_channel_read symbolsPerBeat 1 which I interprete to be a data signal that just carries one symbol of data. This in turn will NOT require the empty signal to be required. 2) The proposed solution will change the interface of a production design which was originally designed in i++ HLS. I need to be sure that this change is required and consistent. May I ask you to elaborate the two concerns from above. Thank you! Ric. Re: HLS Avalon interface data width implementable only with 2^N numbers Hi again, I'm going to add some more specifics to the problem decribed above. As said, we intend to implement an Avalong Streaming interface with the sycl HLS (icpx) workflow. The Avalon Streaming specification has this definition for the dataBitsPerSymbol parameter: The icpx error (Report generation) for non power of 2 numbers is [ 66%] Building CXX object CMakeFiles/report.dir/src/streaming_data_interfaces.cpp.o [100%] Linking CXX executable streaming_data_interfaces.report Compiler Error: The data type carried by _InStream exceeds the bits per symbol. You can either enable the sideband signal 'use empty' or increase the bits per symbol. Error: Optimizer FAILED. Note 1: The error is generated after compilation of the source files while linking. Note 2: EMU generation of non power of two numbers works. Note 3: With Intel HLS (i++) project it was no problem to generate this non power of two width Avalon Streaming interfaces. Hope that helps to identify (and resolve) the problem reported. Re: HLS Avalon interface data width implementable only with 2^N numbers Hi Aik Eu, thank you for responding. Unfortunatelly the link provided do not reveal any relevant information to the specific problem from above. Anyway, it was helpfull for some other work I'm doing! Regards, Ric.