Forum Discussion
Hmmm. This one's looking slightly devious. I took all the hierarchies and put them in a partition(preventing synthesis across boundaries and having every port exist). The three hierarchies were approximately 330LEs and were exactly 672 registers. Now, in your compiles they could get reduced down to as little as ~80 LEs, but always 672 registers. Two things that can occur are: reductions during synthesis and merging logic, i.e. they all decode the paddr so it's possible that some of that decode logic gets merged together, in which case it has to be put into one of the hierarchies, making that one seem artificually higher than the other two, when is what really is occuring is the other two are artificially smaller, as their logic is represented in another hierarchy.
But in your largest compiles, the APBS2 went up to 530 ALUTs. So it grew considerably larger. Looking around a little, I went into the report -> Fitter -> Resource Section -> Control Signals. (By the way, the Resource Section is a gold mine for info, so it's worthwhile to look around and get accustomed to what it shows you.) Anyway, each of the 3 repeated hierarchies use a lot of clock enables. What I'm sure is occuring is that the decode for a 32-bit bus is being done in a single LUT and then sent to the clock enable of a 32-bit register. So the 32-registers look like "lone registers", in that they don't have an accompanying LUT, but their clock enable is used. So the LUT resource goes down. In the one with 532 LUTs being used, APBS2 has hardly any clock enables. So in these cases, the LUT before it is being used. As a note, there are only a few(usually two) clock enables per lab. What that means is if lots of clock enables are synthesized, then packing them into LABs becomes difficult. In third party synthesis and the early days of Quartus synthesis, I've seen them use way too many clock enables and the device runs out of labs when it doesn't seem full(this shouldn't be occuring any more). So there is some balancing on when to use the LUT and when to use the clock enable. The algorithm that does this uses heuristics, which is slang for "I don't know what's going to happen." Or more exactly, over a suite of designs it is optimized to give the best results. But when running test cases, or on particular designs, it may not always do it right. There's an assignment to force usage of the clock enable, but I don't think it will help your case as the clock enable can't be easily isolated to a single node(this assignment works best when you have a register that is used as a clock enable for a large cloud of logic). You might want to file an SR to have it looked at(and maybe include this thread). Hopefully that's a point in the right direction.