Hello,
I am thankful for the reply.!
--- Quote Start ---
Correct. It's kind of like a cache, and the "cache miss" penalty is quite large. It only makes sense to use the memory and keep the operands around if you think you will be using them more than once.
--- Quote End ---
Akhil>> Okay that makes sense. I have to think about the design (sometimes it might be needed to pass the operands just once to the IP module and then
let the IP do the rest). I will take care of it once I have a clear idea about what to do with the core.
--- Quote Start ---
My suggestion is to implement a single new IP component which has two interfaces: an Avalon-MM master, and a custom instruction interface. The Avalon-MM is for the data path (1024-bit operands), and the custom instruction is for the control path (opcodes).
--- Quote End ---
Akhil>> What do you mean by a custom instruction interface? Is it an interface to fetch the corresponding operand (+, - ,..) from the NIOS II? If so how to do
that from my IP module? (please note that passing those 1024 bits may not be a good idea if we have to take 32 clock cycles to do that)
--- Quote Start ---
It's your component and you can do whatever you like to meet your needs, but in my diagram I had been thinking that the output would have been written by BIGNUM back to the memory, and not traverse the instruction interface. In other words, the NIOS tells the BIGNUM where to put the result.
--- Quote End ---
Akhil>>Here you mean to say like the way BIGNUM module accepts 1024 bits data from the on chip memory, there should be a way to write 1024 result bits back to the on chip memory and the NIOS II gets to decide the address.
--- Quote Start ---
It's only going to theoretically go as fast as the interfaces it is connected to. If you're DMA'ing from an 32-bit SDRAM, the 256-bit DMA will only emit a word on 1/8th of the clocks since it has to buffer them up in 32-bit increments.
--- Quote End ---
Akhil>> I think I understand the above concept. All my experimentation will be on a DE1 board which has a 16 bit SDRAM. Hence-by a 256-bit DMA transfer will still have to wait for 16 clock cycles (since it has to buffer all those 256 bits in 16 bit increments) and hence the whole 1024 bits will take 1024 clocks. This gives me no such edge over a normal MM interface transfer. The best case scenario is a 64 bit SDRAM (which is the highest data width for an SDRAM in an SOPC builder) and use it as a 64-bit DMA transfer. In the above case I guess it will push 64 bits in a single clock and hence it should take only 16 clocks to clock in the entire 1024 bits. I will specify it as a future enhancement for the time being.