--- Quote Start ---
I noticed a rather high register to pin delay. I've set optimization to 'speed' but I'm kinda shocked to see ~23ns been blown on getting from a to b.
--- Quote End ---
You need to use Classic Timing Analyzer and timing constraints, eg., here's an example for a PLX PCI9054 PCI-to-Local Bus local bus interface;
# PLX A/D bus input/output
set_instance_assignment
-name TSU_REQUIREMENT "6.0 ns" -from clk -to plx_ad
set_instance_assignment
-name TH_REQUIREMENT "1.0 ns" -from clk -to plx_ad
set_instance_assignment
-name TCO_REQUIREMENT "9.0 ns" -from clk -to plx_ad
set_instance_assignment
-name MIN_TCO_REQUIREMENT "0.0 ns" -from clk -to plx_ad
# PLX A/D bus control signal inputs
set_instance_assignment
-name TSU_REQUIREMENT "6.0 ns" -from clk -to plx_adsN
set_instance_assignment
-name TH_REQUIREMENT "1.0 ns" -from clk -to plx_adsN
set_instance_assignment
-name TSU_REQUIREMENT "6.0 ns" -from clk -to plx_wr_rdN
set_instance_assignment
-name TH_REQUIREMENT "1.0 ns" -from clk -to plx_wr_rdN
set_instance_assignment
-name TSU_REQUIREMENT "6.0 ns" -from clk -to plx_lastN
set_instance_assignment
-name TH_REQUIREMENT "1.0 ns" -from clk -to plx_lastN
set_instance_assignment
-name TSU_REQUIREMENT "6.0 ns" -from clk -to plx_waitN
set_instance_assignment
-name TH_REQUIREMENT "1.0 ns" -from clk -to plx_waitN
set_instance_assignment
-name TSU_REQUIREMENT "6.0 ns" -from clk -to plx_hold
set_instance_assignment
-name TH_REQUIREMENT "1.0 ns" -from clk -to plx_hold
set_instance_assignment
-name TSU_REQUIREMENT "6.0 ns" -from clk -to plx_beN
set_instance_assignment
-name TH_REQUIREMENT "1.0 ns" -from clk -to plx_beN
# PLX A/D bus control signal outputs
set_instance_assignment
-name TCO_REQUIREMENT "9.0 ns" -from clk -to plx_rdyN
set_instance_assignment
-name MIN_TCO_REQUIREMENT "0.0 ns" -from clk -to plx_rdyN
set_instance_assignment
-name TCO_REQUIREMENT "9.0 ns" -from clk -to plx_termN
set_instance_assignment
-name MIN_TCO_REQUIREMENT "0.0 ns" -from clk -to plx_termN
set_instance_assignment
-name TCO_REQUIREMENT "9.0 ns" -from clk -to plx_dp
set_instance_assignment
-name MIN_TCO_REQUIREMENT "0.0 ns" -from clk -to plx_dp
set_instance_assignment
-name TCO_REQUIREMENT "9.0 ns" -from clk -to plx_hold_ack
set_instance_assignment
-name MIN_TCO_REQUIREMENT "0.0 ns" -from clk -to plx_hold_ack
These parameters were tweaked until timing passed. If you can use IOE registers in your design, then you can also try setting FAST_INPUT_REGISTER and FAST_OUTPUT_REGISTER to on (there might be a fast output-enable register option too).
Rather than simply writing code, draw a block diagram and see where you can push the registers to. For example, if you included input registers and delayed the inputs by one clock, would it matter?
Read the plx_interface.pdf document here:
http://www.alteraforum.com/forum/showthread.php?t=34523&p=142651#post142651 Perhaps you can do something similar.
Cheers,
Dave