Recent Discussions
Valid substitute to EN5396QI-T
Goodmorning everyone, I am trying to find valid alternative to EN5396QI-T, EN5366QI and EN5337QI-T, which are now obsolete. I must ensure both the same output current, switching frequency and low input voltage. The substitute must have the integrated inductor too. Do you know any valid alternatives? At the moment, the best I found are the MPS DC-DC converters, but unfortunately they do not reach 5MHz of switching frequency. Thank you all for the attention, Best regards, Andrea B.1.2KViews2likes4Commentsbuild for fpga failed
reported error : quartus_sh: error while loading shared libraries: libncurses.so.5: cannot open shared object file: No such file or directory Error: The patches required to compile for the target board (0.05dcp) is not installed for the following Quartus: /glob/development-tools/versions/oneapi/2021.3/inteloneapi/intelfpgadpcpp/2021.3.0/QuartusPrimePro/19.2/quartus/bin/quartus_sh dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation) make: *** [Makefile.fpga:16: sw_fpga] Error 1 commands : qsub -l nodes=1:fpga_compile:ppn=2 -d . build_fpga.sh cat build_fpga.sh #!/bin/bash source /opt/intel/inteloneapi/setvars.sh make -f Makefile.fpga cat Makefile.fpga CXX := dpcpp CXXFLAGS = -O2 -std=c++17 -Wall all : sw_fpga #all : sw_fpga sw_fpga_emu report sw_fpga.o : sw_fpga.cpp $(CXX) -c $(CXXFLAGS) -fintelfpga $^ -o $@ sw_fpga_emu : sw_fpga.cpp $(CXX) $(CXXFLAGS) -fintelfpga $^ -o $@ -DFPGA_EMULATOR sw_fpga : sw_fpga.o $(CXX) $(CXXFLAGS) -fintelfpga -Xshardware $^ -o $@ report : sw_fpga.cpp $(CXX) -fintelfpga -Xshardware -fsycl-link sw_fpga.cpp -o sw_fpga test_2d_array_emu : test_2d_array.cpp $(CXX) $(CXXFLAGS) -fintelfpga $^ -o $@ -DFPGA_EMULATOR clean: rm -f sw_fpga sw_fpga_emu1.3KViews2likes1CommentAdvice on CNN inference on Agilex 7 using oneAPI
Greetings everyone, I'm tasked with porting an CNN trained with PyTorch to an Agilex 7 FPGA using HLS. I think the right tool for the job is oneAPI. Since this is not a completely novel task, I wonder if there are any existing implementations, libraries, or similar material I can reuse? Like, I prefer to not to have to implement everything from loading weights, to max pooling layers, to fixed points numerics from scratch. I'd be happy with any pointers to materials or tutorials you might have. Thanks in advance.3.7KViews1like15CommentsoneAPI IP integration with Avalon Streaming input/output
Hi i'm new with oneAPI and I want to create an IP to export in Quartus and i wanted to know how to have Avalon Streaming input and output for my kernel to treat images and more generaly how to create interfaces to communicate with other components with Avalon interface. I don't know how to make the difference between pipes/accessors/etc... Thank you !2.1KViews1like8CommentsSeparate queue synchronization and buffer data corruption on Feed-Forward Design Model with Buffer Management
I am having issues on implementing the "Feed-Forward Design Model with Buffer Management". Bare in mind that this is not the first implementation but the last of many attempts. I described my gathered knowledge so far and appreciate any help: 1- I am using OpenCL version 17.1 on an Arria 10 platform. 2- The problem to solve is to organize data coming from a pipe into buffers (large, global memory buffers) that are then used by other kernels or host. 3- The kernel writing to pipe must never stall (or its buffer must be enough to hold the data). I have implemented the following ping-pong buffer like solution: kernel 1: "StreamingToPipe" (streams the data to pipe with a know pattern to later be checked). kernel 2: "Producer" reads the pipe from kernel 1, writes to a buffer and sends tokens to the consumers when data is available. kernel 3 & 4: "ConsumerA" and "ConsumerB" when data is available they copy a fragment of the buffer requested by "producer" to a host allocated buffer. HOST: 4 independent queues, each one executes 1 kernel. The 2 queues on the consumers use callbacks to gather the data and check the patterns. Consumers are enqueued first. Both examples showed below use the same kernels but change the host code: EXAMPLE A: Uses enqueueMapBuffer calls to manage data transfers to host. EXAMPLE B: Uses enqueueReadBuffer calls to manage data transfers to host. PROBLEMS AND QUESTIONS: I have followed the guidelines and advices from best practices guide to use mem_fences. Consumers end, which is supposed to guarantee memory consistency. Example A manages better throughput. But the number of maximun enqueued kernels is low (seems like even when unmapping buffers, data is somehow still stored on RTE and an Error is raised when resources are depleted). Example B the queues for each consumer enqueues the NDrange execution and the enqueueReadBuffer alternatively. However, consumer A and B end up synchronized when they should not be (Higher stall rate and lower overall throughput). The number of kernels I can enqueue with this method does not seem to saturate (good memory handling) On BOTH examples the data on the first 2 buffers (one for each consumer) is inconsistent (data does not check with the patters, from element 8192 onwards). The rest of the buffers are correctly checked on HOST. The models that worked even worse that I tried are: Single consumer feed-forward (more buffer incosistencies) Event synchronized queues (having no events and synchonizing by blocking channels caused better management). Creating a host side-buffer pool to send different buffers each time to the consumers. (Idea taken from the 19.1 introduced "Double Buffered Host Application Utilizing Kernel Invocation Queue" example). Any comment on what is going on with the RTE is appreciated. The code is pretty much the same as the intel programing guide example for managed buffers but modified to use 2 consumers. Thanks.2.3KViews1like1CommentSynchronization issues on Implementation of Buffer Management for OpenCL Kernels
Hello: I have been trying to implement the "Implementation of Buffer Management for OpenCL Kernels" from the Programming Guide. With a simple aproach, my goal is to copy data from a pipe to global memory, and have the consumer copy some part of this buffer to be able to transfer it to host. This is kind of a ping-pong buffer. First to iterations work well, if I try to continuously run it, it stalls. Has anyone successfully implemented this example?2.1KViews1like4Commentsaltchipid can not be used in Partial Reconfiguration mode
1. I am using the Intel A10 PAC board development project. Now I want to bind the project-generated .gbs to a specific PAC board. That is to say, this .gbs cannot be used for downloading to other PAC boards. What should I do? 2. The altchipid can not be used in Partial Reconfiguration mode.How do I read the id of the currently used board?1.5KViews1like1CommentCycle accurate simulation of OpenCL kernel
I wish to quantify the bottlenecks in OpenCL kernel execution and for that look to simulate the kernel. I am aware this question has been asked before and members have replied that this is not supported by Intel and also is very tricky to achieve too due to varying DRAM access times. However, is there a way to simulate only the kernel without considering DRAM access time? Or any other way to differentiate computation time from memory access times (other than the dynamic profiler)?2.7KViews1like3Comments