Floating Point Matrix Multiplication

Altera_Forum

Honored Contributor

16 years ago

One row: 56 mults, 55 adds.

All 56 rows: 3136 mults, 3080.

All 56 rows every 6 us: 523 Mmult/s and 523 Madd/s.

With a 5 ns clock, that becomes 2.6 mults/clock and 2.6 adds/clock.

Quite feasible.

Victor,

looks like you're mixing throughput with latency.

A altfp_mul has a latency of 5 cycles but a throughput of a result every cycle. Which means it takes 5 cycles to multiply a pair ofnumbers but you can feed it a new pair of numbers every cycle. Same goes for altfp_add_sub.

Since matrix multiplication has lots of independent operations, this can be exploited.

The architecture I suggested performs 8 mults/clock and more than 7 adds/clock and manages one complete result every 448 cycles (2.2 us), with a latency of 495 cycles (2.4 us).

Forum Discussion

Recent Discussions

Access to RLC data for Agilex5 IBIS Models

Agilex3/5 GTS Hard Ethernet IP 10G example design pin loc and io std wanted

Agilex 7 I Series Development Kit: External hardware access error when programming

Inquiry: Reference Clock Jitter Limits for 1G Operation on Agilex 5

F-tile 10GBASE-R firecode FEC IP (Agilex 7)