O.K. I see now, that the problem is apparently caused by an insufficient setup time for read_en. This clarifies, why you get different results with different chip families and speed grades. Changing read_en assertion to clk rising edge make the address counter operate correctly.
read_en is an unrelated signal in your design, so it's not checked in timing analysis. You should either specify timing constraints for it (requires most likely usage of TimeQuest), register it in your design, or assign a more suitable signal timing. I guess, changing all input signals at the rising edge of clk would be better.