thanks, i looked at the threads you linked but i didn't find the answer i needed.
Finally i was able to create a carry chain, but i wasn't able to route the couts to the flip flop inputs. The compiler creates a new cell and connect the cout of the previous cell to the cin of this cell then it passes trhough the lut and it reaches the register, but this is not what i want, since the lut delay propagation is quite longer than the cin-cout's one. Since i saw in literature there are implementations of tapped delay line exploiting the carry chain of cyclone II devices, i wonder how they were able to connect the cout directly to the flip flop. Even taking a look at the datasheet, it seems there's not a route between the cout and the flip flop of the cell. I saw xilinx architectures include a route trhough a mux to connect cout to the cell register. Moreover the guidelines of the primitive carry_sum indicate that it's not possible to connect a cout to an output pin so i desume the carry chain is just dedicated to internal use without the possiblity to read its values through a flip flop.
I would like to have a confirm or less about this point.
Thx for your patient, best reguards!