For a simple counter, the synthesis tool will infer about the same implementation for a counter described in RTL as you will get for a counter implemented with an instantiated megafunction.
The frequency involving tsu and tco at the I/O pins matters only if it is system frequency that you care about. This is the clock rate of a system with data passing between devices that use the same clock signal or related clock signals. The period for system fmax is tco of the driving device plus board delay effects plus tsu of the receiving device. If you just care how fast the counter is running internal to the FPGA, then the I/O timing at the pins is irrelevant for that.
Fully constrain your timing. Quartus needs to know how fast you want the clocks to run and how fast you want the tsu and tco at the device pins to be. I/O timing is best constrained with input delays and output delays. Since you are apparently new to FPGA design, use TimeQuest so that you are learning how to use the best timing analysis tool. The TimeQuest documentation explains how to use set_input_delay and set_output_delay. I suspect you are using the Classic Timing Analyzer with device-wide tsu and tco constraints. That's OK to get started but is not the best way to learn how to do an FPGA design. Besides TimeQuest being the preferred timing analyzer, it is better to constrain the timing on each pin instead of using device-wide constraints. If you constrain each pin individually (even if all pins need the same constraint number), it will be more obvious to someone maintaining the design later that you really did want that requirement on each of those pins.
After you have compiled with all timing paths fully constrained, use "Tools --> Advisors --> Timing Optimization Advisor". Check the recommendations in the "Maximum Frequency (fmax)" category. If you really do care about system fmax including I/O timing, then also check the "I/O Timing (tsu, tco, tpd)" category.
The Advisor's I/O category includes the I/O cell registers that were recommended to you in your other recent thread entitled "Latency and Famx calcualtion [sic]". A register must connect directly to a device pin with no logic between the register and pin in order to place the register in the I/O cell for fastest I/O timing.
The best possible timing depends on the device family, to some extent on the specific device within the family, and on the speed grade. If you have not constrained your design properly yet, the Fitter might not be giving you the best possible timing for the currently selected device. (Later edit: I first wrote this paragraph before I noticed you said that you had constrained the design with fmax and I/O timing requirements. The Fitter probably was already trying to meet those requirements, but still check the Advisor to see what setting changes you should try.)