Hello @BoonBengT_Altera
I still have doubts regarding the hardware run of the FPGA. I don't understand why a small parallel operation may take milliseconds while the FPGA runs on more than 500 MHz. Even after I specify the clock flga rate:
–Xsclock=500MHz
nothing changes, it still runs with milliseconds where host code runs way faster. Isn't FPGA for acceleration? And the FPGA code above is very basic. And also, is there a way to know how many clock cycles the kernel code took, is it equivalent to the latency in the report? The latency in the report is 343, but without units, what does 343 mean, is it the number of clock cycles for example?
2- The segmentation fault problem was solved, nothing wrong with the hardware run except the amount of time it takes compared to a normal c++ code.
3- In my experiment, single_task with a loop inside took almost the same time as parallel_for without a loop inside. which means an iterative code takes as much as the parallel one in the hardware. Is this normal?
Thank you!