Forum Discussion
Altera_Forum
Honored Contributor
9 years agoFirst of all, for the specific example you give it's quite likely that you can do matrix multiplies faster with a multi-core CPU with vector instructions or in a GPU than an FPGA could do them.
One way is to have dedicated memory attached to the FPGA. The FPGA is addressable by the CPU and the software arranges for the incoming data to be placed in the FPGA dedicated memory. Once the data are in FPGA memory, the CPU writes to registers in the FPGA telling it what to do. When the FPGA finishes it can interrupt the CPU, or the CPU can periodically check registers in the FPGA to see if it is done.