Interesting. Turning off hold timing optimization somewhat eases the routing pressure, setup failures in this particular pipeline go away (at least at 180 MHz), but instead I see a bunch of hold failures elsewhere in the design.
Noticed that the Chip Planner confirms the existence of a cap on inputs. The "Local Interconnect", the primary routing channel that feeds all ALMs, is shown to have the maximum routing capacity of 46 in its tooltip. In addition, there is a "Local Line" that connects ALM outputs to ALM inputs within the same LAB, with the capacity of 20.
Direct links between adjacent LABs are definitely there, but not visible as such in the Chip Planner. Unclear how wide they are. If Figure 1-1 in the Device Handbook is at all to scale, they might be very narrow, less than 10 wires per LAB. In either case, first, direct links seem to count toward the cap of 46 total inputs, and second, the fitter does not seem at all interested in using them.