I want to multiply a 56x56 matrix with a 56x1 matrix in floating point. There is a altfp_matrix_mult megafunction which does not compute this quickly enough. I am looking for ideas to implement this in floating point. Please let me know if you have any ideas. Thanks.

Is latency really a factor? whats the application for this? FPGAs really dont like doing floating point. I would reocmmened trying to convert it to fixed point as it Hugely reduces the latency and logic requirements. If you really have to do it floating point, you're stuck with long latency and large resource requirements.

The altfp_matrix_mult Handbook gives an overview of required FPGA resources versus GFlops/s throughput. You can hardly expect to achieve a better result with a different FP design, so it can basically answer the question, if the intended design is feasible at all. If the achievable GFlop amount isn't an issue, but altfp_matrix_mult doesn't fit the design structure, then it can be meaningful to think about a different FP design.

Are you resource starved? What happens if you change the calculation to 56 dot products ... each row of the 56x56 matrix by the same 56 element vector?

--- Quote Start --- Is latency really a factor? whats the application for this? FPGAs really dont like doing floating point. I would reocmmened trying to convert it to fixed point as it Hugely reduces the latency and logic requirements. If you really have to do it floating point, you're stuck with long latency and large resource requirements. --- Quote End --- I want to first see if this can be done in floating point before looking at fixed point. The result needs to be available every 6 us. This includes the time to load the matrice - at least the smaller one. The bigger one does not change frequently. Yes, I want to do it with the least amount of resources. I am targeting a Stratix 3 so if I do 56 multiplications in parallel then wil use up 224/288 ~80% of the multipliers just for this.

Since 56 in parallel is too many resources then split the large matrix by groups of rows - enough to meet the latency requirement. For example, use 7 matrix mults each processing 8 rows, then recollect the 7 8x1 results.

Floating Point Matrix Multiplication | Altera Community

18 Replies

Altera_Forum
Honored Contributor
16 years ago
They have a minimum pipeline delay of 7 respectively 5 cycles, more if highest clock frequency is intended.
Altera_Forum
Honored Contributor
16 years ago
But they're fully pipelined, right?
Altera_Forum
Honored Contributor
16 years ago
Yes, of course.
Altera_Forum
Honored Contributor
16 years ago
That will take 5+7*3 (for first 8 multiplications and 7 additions) +5+ 7+7+7 = 52 cycles for 1 row
For 56 rows = 56X52 = 2912 cycles = 14560 ns
That is slower than I need.
Altera_Forum
Honored Contributor
16 years ago
The project feasibility can be determined in terms of required GFlops/s and available (respectively granted) multiplier resources, before thinking about structures.
Altera_Forum
Honored Contributor
16 years ago
One row: 56 mults, 55 adds.
All 56 rows: 3136 mults, 3080.
All 56 rows every 6 us: 523 Mmult/s and 523 Madd/s.
With a 5 ns clock, that becomes 2.6 mults/clock and 2.6 adds/clock.
Quite feasible.

Victor,
looks like you're mixing throughput with latency.
A altfp_mul has a latency of 5 cycles but a throughput of a result every cycle. Which means it takes 5 cycles to multiply a pair ofnumbers but you can feed it a new pair of numbers every cycle. Same goes for altfp_add_sub.
Since matrix multiplication has lots of independent operations, this can be exploited.

The architecture I suggested performs 8 mults/clock and more than 7 adds/clock and manages one complete result every 448 cycles (2.2 us), with a latency of 495 cycles (2.4 us).
Altera_Forum
Honored Contributor
16 years ago
I did not realize that you can send the next data input without waiting for the prior operation to finish. Thanks for pointing it out.
Altera_Forum
Honored Contributor
15 years ago
Hello there,

I would like to ask you if you got the same error as me, I always try to compile a single matrix_multiplier but this error appears:

library "work" does not contain primary unit "hcc_package"

Do i have to import this package? I've been looking for it but i cant find it, where is it?

Regards,
Juan

Forum Discussion

Floating Point Matrix Multiplication

18 Replies

Recent Discussions

Access to RLC data for Agilex5 IBIS Models

Agilex3/5 GTS Hard Ethernet IP 10G example design pin loc and io std wanted

Agilex 7 I Series Development Kit: External hardware access error when programming

Inquiry: Reference Clock Jitter Limits for 1G Operation on Agilex 5

F-tile 10GBASE-R firecode FEC IP (Agilex 7)