Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
16 years ago

Synplicity run time issue

Hi,

I am battling a Synplicity run time issue here.

We're emulating an ASIC across multiple FPGAs. The ASIC is partitioned in such a way that there are two parts to each FPGA design - a CORE part and a PERIPHERY part. The CORE part has the "real" design content while the PERIPHERY part is the "glue". It contains muxing, serialization to send the data out on LVDS channels (and also deserializiation and demux for receive path). CORE operates at a slow clock (4 MHz) while the PERIPHERY has higher speed clocks to achieve muxing, serialization to support 750 MHz LVDS transfers. As you can see, the CORE part is unique for each FPGA while the PERIPHERY part is re-used across FPGAs.

When I compile a single FPGA standalone in Synplicity, synthesis run time is around 2.5 hrs. Quartus run times vary from 2 hours to 8 hours. Now to save on run times, I want to use Quartus incremental flow. Idea is that I synthesize the PERIPHERY part only once and use it for all FPGAs. And synthesise the CORE part individually for each FPGA.

But when I start doing this, I find that the runtimes for building the CORE of FPGA are 2x to 4x in comparison to building the complete FPGA itself. This seems counter-intuitive, we're compiling a smaller design, so we expect better run times. When we're compiling the "CORE", we had to define virtual ports for the connections between CORE and PERIPHERY. The number of such virtual ports is ~5k. So, one theory is that the tool spends so much time mapping these virtual ports to low level Altera primitives. These virtual IOs are unconstrained. Even if we set a false path on them to see if that'd help run times, but it did not.

Thanks for your help.

-Dinesh

2 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I think I missed something with your post. Is it Synplify run times or Quartus run times? Your post starts off saying Synplicity but the Virtual Pins are all done in Quartus after the Synplify runs. Are you running PERIPHERY once as a separate Synplify project and writiing out a .vqm, then running CORE as another project, and then adding them together in Quartus. If this is causing Synplify run times to go up, I really have no idea, as they have a pretty straightforward task that tends to be linear with design size.

    In Quartus, I'm not sure why you're doing Virtual Pins. They tend to be used when fitting just a sub-section of the design by itself, and it has more I/O than the device actually has available. So for the most part they're not used when targeting real hardware. That being said, you may be using them if doing a bottom-up Incremental Flow(most people using Incremental Compilation aren't doing the bottom-up flow, so I'm just checking). In that case you're actually placing and routing PERIPHERY and CORE in separate projects(or maybe just PERIPHERY and then importing that place and route information into the CORE project). If this is the case, are you using LogicLock regions? Also, what device and approximately how full is it? And which part causes the 2x to 4x increase, is it the fitter? If it is, there are messages that say how much time was spent placing, how much routing, etc., so it might be worthwhile to check which part is increasing.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Hi,

    I am battling a Synplicity run time issue here.

    We're emulating an ASIC across multiple FPGAs. The ASIC is partitioned in such a way that there are two parts to each FPGA design - a CORE part and a PERIPHERY part. The CORE part has the "real" design content while the PERIPHERY part is the "glue". It contains muxing, serialization to send the data out on LVDS channels (and also deserializiation and demux for receive path). CORE operates at a slow clock (4 MHz) while the PERIPHERY has higher speed clocks to achieve muxing, serialization to support 750 MHz LVDS transfers. As you can see, the CORE part is unique for each FPGA while the PERIPHERY part is re-used across FPGAs.

    When I compile a single FPGA standalone in Synplicity, synthesis run time is around 2.5 hrs. Quartus run times vary from 2 hours to 8 hours. Now to save on run times, I want to use Quartus incremental flow. Idea is that I synthesize the PERIPHERY part only once and use it for all FPGAs. And synthesise the CORE part individually for each FPGA.

    But when I start doing this, I find that the runtimes for building the CORE of FPGA are 2x to 4x in comparison to building the complete FPGA itself. This seems counter-intuitive, we're compiling a smaller design, so we expect better run times. When we're compiling the "CORE", we had to define virtual ports for the connections between CORE and PERIPHERY. The number of such virtual ports is ~5k. So, one theory is that the tool spends so much time mapping these virtual ports to low level Altera primitives. These virtual IOs are unconstrained. Even if we set a false path on them to see if that'd help run times, but it did not.

    Thanks for your help.

    -Dinesh

    --- Quote End ---

    Hi,

    how did you split your design in the CORE and PERIPHERY part ? By hand or with the "compile point" feature or with Certify ? Is the sum of used FPGA resources much larger compared with the run of the complete design ? Keep in mind that when you use compile point in SynplifyPro or partitions in Quartus the boundaries are fix. This could increase the design size.

    Kind regards

    GPK