Forum Discussion
Altera_Forum
Honored Contributor
16 years agoOne row: 56 mults, 55 adds.
All 56 rows: 3136 mults, 3080. All 56 rows every 6 us: 523 Mmult/s and 523 Madd/s. With a 5 ns clock, that becomes 2.6 mults/clock and 2.6 adds/clock. Quite feasible. Victor, looks like you're mixing throughput with latency. A altfp_mul has a latency of 5 cycles but a throughput of a result every cycle. Which means it takes 5 cycles to multiply a pair ofnumbers but you can feed it a new pair of numbers every cycle. Same goes for altfp_add_sub. Since matrix multiplication has lots of independent operations, this can be exploited. The architecture I suggested performs 8 mults/clock and more than 7 adds/clock and manages one complete result every 448 cycles (2.2 us), with a latency of 495 cycles (2.4 us).