Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
17 years ago

Tco Constraints

Hi,

I am working on an old Flex10k design which has quite a few timing problems that I am in the process of fixing.

At the moment I have quite a few Tco warnings due to constraints put in place by the original designer. For all but 1 of the warnings, I am able to explain why there is an issue (its seems as if the constraint was placed to aggressively and with the way in which the code is done, in some cases with a lot of combinational logic, Quartus cannot meet the requirment.

However 1 warning remains which I cannot explain and I am hoping somebody can give me a clue. The contraint as it appear in the assignment editor has the from field left blank (does this mean the same as * ?), the to field contains a internal reset signal which is activated on a clock edge and the time requirment is 14 ns.

Quartus then reports an error 'from' this signal 'to' an output pin with a slack of -1.1 ns.

I have a feeling that the constraint is not being correctly applied, but then again Quartus does not ignore the constraint or produce a warning.

What I understand a Tco constraint does is that it specifies the maximum acceptable time for a signal at the input to a register on a clock edge to appear at the output pin. Is this what it actually specifies?

Therefore if someone specifies an internal signal in the 'To' field of the constraint (as opposed to an ouput pin), I don´t understand why an error is not produced by quartus and why this causes the warning mentioned above which doesn´t 'seem' to relate directly to the constraint. (as the output in question was not constrained)

I´d be grateful if soemone could give me some ideas as to why I see this.

Also, as I mentioned at the start of the post its seems that some of the Tco constraints seem to have been placed too aggresively. Are there any guidelines for placing Tco constaints that i should be aware of, if I want to verify or change existing constraints.

Many thanks for your help

17 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Batfink, good idea. I am hoping to get around to looking at the outputs on a scope and comparing them with the 2 different programs. If no significant difference exists, i am hoping to disable the slow slew rate constraints.

    Rysc, thanks for the advice, I hadn´t been exapnding the paths like that before. But doing so provides a wealth of information and its much simpler to find where exactly in the desgin the problems exist.

    I can understand and follow what Quartus is telling me, but there are some terms that I do not fully understand.

    For example I am looking at the worst case setup warning and under the "- longest register to register delay" section, Quartus details 5 different sections that make up the path between the 2 registers, the 2 Reg nodes and 3 combinational nodes. For each one it gives the delay in the signal getting there but it calcualtes the delay using the following formula:

    Info: 2: + IC(4.500 ns) + CELL(1.400 ns) = 5.900 ns; Loc. = LC1_D30; Fanout = 54; COMB Node = 'vic068:vic068_inst_1|local_if:local_if_inst_1|lm_mux:lm_mux_inst_1|i_m_blt~53'

    I don´t understand what the IC delay refers to and what the Cell delay refers to. Although they both add up to 5.9 ns which is the important part. But I´m just curious as to what the classic timing anlyzer is actually doing.

    Again many thanks.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Digging through the Quartus help it would appear that IC is the interconnection delay and CELL is the cell (i.e. logic propagation) delay. Apparently the routing forms the majority of the delay in modern chips so this would seem to tie up.

    Understanding the routing delay and how Quartus has laid the chip out can help you squeeze a bit more out of the chip - e.g. where you can see a critical delay between two cells which are placed wide apart, you can use cliques to put them in the same row or LAB. You won't get that much gain by going in and looking at the routing in this way (a few ns in certain bits) but if you're right on the edge of meeting your timing constraints it can be a help.

    Looking at the fan-out can help you make changes to your design - e.g. adding pipelining were you have multiple levels of combinatorial logic - if you have say an adder and a mux followed by another adder between two registers and this appears to be the critical path, then stick a layer of registers between one of the adders and the mux - uses more resources but less combinatorial logic between any two registers and so the overall clock speed will increase.

    Also there are some settings in Quartus to automatically add cells (combinatorial logic) and registers to improve timing - from memory I think these ar ein the fitter settings. basically by duplicating registers or cells, you can reduce the fanout and ease the routing delay.

    Just a note of caution when you're looking at fast edges - use a fast scope with fast probes and don't have a long earth clip - take the earth clip off and wrap a short length of stiff bare-metal wire around the earth case of the probe - this will give you a very short earth which you can touch onto your circuit at a point (earth of course) very close to the point you are actually probing. This will give you a very low inductance earth connection and will give you a much better picture of what you're looking at. Check out oscillator outputs like this - with a standard earth clip you can get a nasty looking sine wave, with a low inductance earth you can get a nice square wave - the difference is quite surprising.

    This sort of work is wquite frustrating hen you're doing it but I do think that experience of bad designs and poor documentation from other people can make you a better engineer in the long run.

    Good luck
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks for the reply Batfink.

    I do not fully understand what you mean by clique, when you say "you can use cliques to put them in the same row or LAB".

    It sounds like something I could try as I´m on the edge in quite a few cases.

    Thanks
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi Ardni

    Sorry for the late reply.

    Having checked the help, cliques only work on ACEX 1K, FLEX 10K, FLEX 6000, or Mercury devices. You're using a Flex10k right?

    Basically you can assign nodes of a design to a defined "clique" - from memory you have to give the clique a name and then assign nodes to it. Members of that clique are then placed in the same LAB or row (depending on what you set). E.g. define a new clique called "My_clique" with all its members being placed in the same LAB. Then where you have large delays between two nodes that aren't quite meeting your timing, add those nodes to the clique and Quartus will place them in the same LAB which will reduce their timing delay. If they are already in the same LAB before you start then don't bother - the clique won't make any difference.

    Sorry it's a bit vague - I haven't done it in years.

    I'm sure you should also just be able to make a location assignment and assign nodes to a certain row (or possibly even a particular LAB). Try fixing your critical logic elements that don't meet timing to the row right next to the device pin.

    For anything you do you'll need to look at the floorplan to see where the delays are and also only change one thing at a time - it will have other knock on effects. From memory when I've done this I haven't had more than about five such assignments. If you change your design (source code) then you'll probably have to delete these assignments and start again.

    It's not an easy solution and will only really be effective if you're pretty close on just a couple of paths. if you've got a screen full of timing failures then this approach probably won't help - you'll just waste a few days chasing your tail.

    Good luck
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi Batfink,

    Thanks for the info. Yes I´m using a Flex10K and I tried your suggestion. The results were positive. I tried placing certain nodes where I saw excess delay in the same LAB and I did see some savings..enough to make timing on certain paths.

    I tried doing this at 50MHz where the design is right on the dge of making timing, but the main goal is to have this design working at 64MHz. At the moment there are too many paths which do not make timing and as you said the knock on effect is noticable.

    For this design to run at 64MHz, certain parts of the code will have to be re-done. Once this is achieved, if the design is much closer, I think using cliques could be very useful.

    Just one question. You mentioned that cliques are only available on ACEX 1K, FLEX 10K, FLEX 6000, or Mercury devices, so I was just wondering how can this trick be implemented on the newer devices?

    Also I was wondering why Quartus would not implement something like this automatically, to guide the fitter when it sees certain paths not making timing?

    I´m sure that quartus is doing something along those lines, but that there is a good explanation as to why some paths still fail.

    Would this be an advantage of using a 3rd party synthesis tool? that perhaps they would synthesis the design differently and obtain better results?

    Anyway although this particular project has been a real head-wrecker, I´ve certainly learned a lot of new stuff. I never would have known about cliques for sure and probably wouldn´t have have learned as much about timing issues, so once again thanks for all the help.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Just one question. You mentioned that cliques are only available on ACEX 1K, FLEX 10K, FLEX 6000, or Mercury devices, so I was just wondering how can this trick be implemented on the newer devices?

    --- Quote End ---

    To be honest I don't know. But if you were to try this on say a Cyclone (1,2 or 3) then you'd probably find that Quartus had no problem meeting the timing just because the device is faster.

    --- Quote Start ---

    Also I was wondering why Quartus would not implement something like this automatically, to guide the fitter when it sees certain paths not making timing?

    --- Quote End ---

    Quartus does try to place logic to meet timing but I think what you're doing here is possibly a bit more of a human kind of thinking and possibly not that easy to turn into an algorithm. Or of course it may be that Quartus does do this on newer devices which is why you don't have the option of using cliques.

    --- Quote Start ---

    Would this be an advantage of using a 3rd party synthesis tool? that perhaps they would synthesis the design differently and obtain better results?

    --- Quote End ---

    Quite possibly. I used to use Leonardo (Mentor Graphics) in my previous job. I found that for the cyclone designs that we were doing at the time, using Leonardo for synthesis gave better results than using the Quartus synthesis capability and weirdly better than Precision (Mentor's newer tool). Leonardo also seemed to give better results than Synplify.

    Interestingly at that time, Quartus was better at the microscopic synthesis - i.e. if you carefully coded up exotic registers (clock enables and synchronous sets/clears asynchronous sets/clears/loads) then Quartus was better than Leonardo. However take a huge design and Leonardo was just better at optimising huge lumps of logic.

    A word of caution here though - don't take any of this as a recommendation to buy that particular tool. There were a few designs where Synplify was better than Leonardo. some Lattice designs that I worked on were better in Precision than either Synplify or Leonardo. It's been a couple of years since I did any serious comparison and this may all have changed and may not have been valid for the sort of designs that you're doing. Also I think Mentor have been winding Leonardo down and trying to replace it with Precision.

    Try your design(s) and see. If you get in touch with the vendors of the various tools then then they usually give you a trial licence.

    --- Quote Start ---

    Anyway although this particular project has been a real head-wrecker, I´ve certainly learned a lot of new stuff. I never would have known about cliques for sure and probably wouldn´t have have learned as much about timing issues

    --- Quote End ---

    You've also learned how vitally important it is to document your designs properly! from what you've said I seriously doubt your man would be able to explain all of his timing constraints himself. Although these sorts of jobs are a pain I do believe they can make you a better engineer in the long run if you decide you don't want to leave a similar mess as part of your legacy.

    Cheers

    batfink
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    For the history, cliques were around because they worked very nicely into how the fitter worked. Around the time of Apex, a completely new fitting algorithm was implemented that cliques did tie into so nicely, and so they were disabled. (The algorithm was much, much better, consistently giving better results in shorter compile times, so the trade-off was well worth it.)

    In fact, you might want to look at your fitter settings and see if there's some way to enable a different fitter(it's been many years, so I don't remember this).

    Since then, LogicLock regions have been introduced, including auto-sized/floating regions, which are essentially a clique with more granularity as to how big it is. That being said, most of the time if a designer takes their critical paths or hierarchy and throws them into a floating LLR, the results are equal or worse. The reason is that the fitter is already aware of what's critical and doing a very good job at optimizing it, so just drawing a rectangle around it actually limits the fitter's effectiveness and the choices it can make, but doesn't really provide it any info.

    The times I do see LogicLocking help performance is when the user does it more like a floorplanning tool, i.e. they put an LLR on one edge for the PCI core, which connects to the another LLR which is the ingress hierarchy, etc. If the LogicLocking provides high-level layout information to the fitter, than it can help performance. (Not all the time, and I don't see huge gains, but it can help.)