Yes my design is "spread", basically it is composed of 180 parallel functional units, each take data to M9Ks (simple dual port) and the outputs are shuffle to the others FU memories by a large barrel shifter ( 180 per 8bits) ... .memories addressing is shared and controlled by a global controller
--- Quote Start ---
Routing is very, very seldom the problem. That doesn't mean it's not a large part, but the placement is usually the culprit. For obvious reasons, a spread out placement will cause long routes. (And to be honest, a spread placement is usually caused by a spread design, i.e. something like a mux that might feed multiple components in a device).
Can you list the path details of placement and routing(from TimeQuest, do report_timing with the -file "file.txt" option), or make your own if using TAN. Also, right-click on the path in TimeQuest, Locate -> Chip Planner, and then click the Expand button to see the actual routing. I'm curious if it's pretty much the Manhattan Distance, or pretty close. (Again, routing is almost always good, which is why this is strange). That's at least a good starting point...
--- Quote End ---