1- I know that my target is A10, I am asking about the commands that help me see the information related to a specific device or general devices, and the commands that helps me specify the frequency and so on, but I couldn't find them?
2- The code finally worked with this dpcpp commands for emulation and hardware run:
dpcpp -fintelfpga -DFPGA_EMULATOR knn_trial2.cpp -o knn.fpga_emu
and
dpcpp -fintelfpga -Xshardware fpga_compile.cpp -o fpga_compile.fpga
The results give now an average of 0.0005 s, which is fine, it is still slower than the iterative code, this might be because of overhead you mentioned? yet it is way faster than python code that runs in 0.012 s.
The segmentation fault is caused by sorting, it works fine with emulation but the segmentation fault is only after fpga run, Isn't host and kernel codes separated even with fpga run?
3- I have a question regarding parallel_for, single_task, and work groups:
Doesn't parallel_for mean that all the elements (from 0 to num_size) do the same job at the same time? i.e. runs in parallel. I have searched for the difference between parallel_for, single_task, and work groups, and didn't find satisfying explanation for each of them. does parallel_for cause all elements to run in one operation in almost one clock cycle?
Thank you!