--- Quote Start ---
If you are shipping really short amounts of work to the accelerator then yes software will be faster due to the communication overhead. The only way this could be efficient is if you perform the same operation across a large block of data in memory and you use DMAs to stuff the data into the accelerator.
--- Quote End ---
Yes, my project is to test the performance of the system with and without using DMA. But for the basic one, I try to test the performance between hardware and software first without involving dma. next step, i will try to include dma. by the way, is there any tutorial regarding how to transmit and receive data for dma in c language?