A corollary. The handbook states that Cyclone V ALMs in shared arithmetic mode can do 3-input adds, with 2 output bits per ALM. However, that would require 6 inputs per ALM and 60 inputs per LAB. And, since there aren't enough feeds to supply all 60 inputs at once, the carry chain needs to be spread over multiple LABs. If you're adding three 32-bit numbers, you might naively expect to need 2 LABs for that (one full LAB and 6 out of 10 ALMs of the LAB immediately below). In practice, it's going to put bits 0..9 into the bottom half of LAB#1 (say, X78_Y6), bits 10..19 into the top half of LAB# 2 (X78_Y5), and bits 20..31 into LAB# 3 (x78_Y4). (Confirmed with Chip Planner. Conveniently, the carry chain is allowed to span only half of the LAB.)
The remainder of those LABs won't be totally wasted, though - you still have 16 feed lines left in each of the top two LABs and 10 feed lines left in the bottom LAB, so it will be possible for the fitter to squeeze some extra logic in there.