I have a design which does single precision computations using Altera Floating point IPs. However since these IPs don't seem to have a 'valid' or 'done' output bit, I'm not able to see how to connect one module to another one. My concern is that how will a successive module know when to take the output from the previous module. Could someone help with this?

A valid or done bit is only meaningful for sequentially operating units. The said FP MegaFunctions are fully pipelined, emitting a new result every clock cycle. The input data must be valid only for one clock cycle, you have to know the pipeline delay, however.

its quite easy just to store a valid bit in parrallel to the floating point units if you need it for other modules.

It would have been smart thinking from the Altera guys if they had provisioned the 'valid' pipeline inside the building block. It would make for a much cleaner design as we don't have to add the glue logic mentioned by Tricky. While we at it: can we have a separate clock enable for every stage too? I have a dataflow based development environment, but because all of Altera's building blocks use a global clock enable I'll be stuck when I would need more advanced functionality (or with a pipeline greater than 1).

Could anyone give a small code example to show what the glue logic about the parallel bit is being talked about?

The "parallel" valid bit chain is simply a shift register (respectively a number of cascaded D-FFs), the delay (number of stages) is equal to the pipeline delay of the respective IP block. I aggree, that Altera could have added it as an option, but as mentioned above, it won't be of any use in the standard application, where a continous data stream is fed to the IP.

How to interconnect modules without 'valid' or 'done' output signal

12 Replies

Altera_Forum
Honored Contributor
15 years ago
A valid or done bit is only meaningful for sequentially operating units. The said FP MegaFunctions are fully pipelined, emitting a new result every clock cycle. The input data must be valid only for one clock cycle, you have to know the pipeline delay, however.
Altera_Forum
Honored Contributor
15 years ago
its quite easy just to store a valid bit in parrallel to the floating point units if you need it for other modules.
Altera_Forum
Honored Contributor
15 years ago
It would have been smart thinking from the Altera guys if they had provisioned the 'valid' pipeline inside the building block. It would make for a much cleaner design as we don't have to add the glue logic mentioned by Tricky.
While we at it: can we have a separate clock enable for every stage too? I have a dataflow based development environment, but because all of Altera's building blocks use a global clock enable I'll be stuck when I would need more advanced functionality (or with a pipeline greater than 1).
Altera_Forum
Honored Contributor
15 years ago
Could anyone give a small code example to show what the glue logic about the parallel bit is being talked about?
Altera_Forum
Honored Contributor
15 years ago
The "parallel" valid bit chain is simply a shift register (respectively a number of cascaded D-FFs), the delay (number of stages) is equal to the pipeline delay of the respective IP block.

I aggree, that Altera could have added it as an option, but as mentioned above, it won't be of any use in the standard application, where a continous data stream is fed to the IP.
Altera_Forum
Honored Contributor
15 years ago
So what I understand:

1-Instantiate IP in module
2-Also make a shift register to implement the latency delay of your IP
3-The shift register holds a '1' for the 'valid' bit which gets successively gets shifted and is finally given as output.

Right?

Also what is the buffer capacity of the IPs, if I keep giving new data in every clock cycle, how long before I have to stall the input data before the IP starts giving wrong outputs?
Altera_Forum
Honored Contributor
15 years ago
If the design is fully pipelined there's no buffer.
it can process a new input for every clock cycle.

After an initial delay, the IP provides an output for every clock cycle.
Altera_Forum
Honored Contributor
15 years ago
But there are two different delays for an IP:
The delay between first input and output and the delay between subsequent outputs assuming inputs are being given every clock cycle.

For example for the exponential core,
there is a latency delay of 17 clock cycles between the first input and output but subsequent outputs appear at intervals of 6 clock cycles(not every next clock cycle) assuming new input data is being given at every clock cycle. Hence I was thinking that there will be a point where probably the buffer or whatever the mechanism inside the IP is, will be overflown by the input data. Am I correct in my understanding? Thanks.
Altera_Forum
Honored Contributor
15 years ago
Can you tell me where did you read the 6 cycles delays between the subsequent inputs and the results?

I'm trying to read the "Floating-Point Megafunctions
User Guide" and still found nothing.

Thx
Altera_Forum
Honored Contributor
15 years ago
I assumed the same myself after reading the IP documentation. (Page-35 of floating point megafunctions userguide gives the detail for floating exponential IP). But after instantiating the IP and running testbench on the code, I found that initially it takes 17 clock cycles to produce the output and thereafter it takes only 6 clock cycles. Try it. Just instantiate the IP and run a testbench which constantly supplies input data. Let me know what you find.

Forum Discussion

How to interconnect modules without 'valid' or 'done' output signal

12 Replies

Recent Discussions

Timing analysis - long combinational path

timing violation fix

Issues with downloading

Quartus Prime Lite 25.1 License Error - "Unable to checkout a license" (SALT_LICENSE_SERVER)

Quartus Prime Pro 26.1 - Where to find Documentation of new Signaltap features