A skew of 3-4ns is not normal at all. It should be ~200ps or less(depending on many factors).
SDC looks good. I noticed you're not using a virtual clock for the I/O interfaces. Not too big a deal, but I would recommend it.
Do you have virtual pin assignments in the design? I'm guessing that's what is happening. What about virtual clock assignments? I'm guessing that's what's causing it.
Can you change your SDC so the set_input_delay and set_output_delay are applied to specific ports(ones you know that really drive I/O), rather than [all_inputs],[all_outputs], which I'm guessing is catching the virtual I/O.
Finally, the I/O won't meet timing. Just looking at the outputs, the setup relationiship is 4ns, and the external delay is 2.664ns, which leaves 1.336ns for the FPGA to get data off chip. Right now your clock doesn't drive a PLL, so your going to have the entire global clock tree delay + the output buffer for your delay, which will be significantly larger than 1.336ns. You're minimally going to need a PLL to run I/O at 4ns period, and may need some other tricks. (That's where users often start doing source-synchronous interfaces).