Well you should have mentioned the temperature dependence earlier.
Have you scoped the data and clock signals while freezing / heating the part. I'm guessing one of the signals will cease to toggle (or the amplitude will diminish). Judging by your comments, I'm guessing this is the first RoHS compliant board this board-house has assembled for you. I have seen numerous issues with board-houses overheating / underheating RoHS boards and resulting in these same problems.
I've seen the following issues:
1 - cracked vias (from overheating). Via makes contact until board reaches a certain temperature then expands and no longer makes contact. Look for loss of signal when you vary the temperature. I've seen the x-rays of this phenomenon.
2 - incomplete solder joint on BGA package (from underheating). Ball inconsistently or never makes contact with solder pad. Again look for loss of signal when you vary the temperature.
3 - Shorts under a ball grid (these are not normally due to temperature). A sloppy solder job can cause intermittent shorts that are heat dependent. Also, solder flux left under a part can cause this to occur after fabrication. Look for diminished amplitude on the signal and check signals for shorts (or very low resistance) with each other or with VCC or GND). If your scope shows a signal not quite reaching VCC or GND, you've probably got a short somewhere.
This is probably a long shot but a good JTAG boundary-scan tool will do open-short tests. I don't suppose you've got any of those at your disposal?
Have you tried varying the temperature on the boards that are working? If they don't start failing, then throw the idea of timing out the window. A borderline timing violation would manifest itself on your other boards under the right conditions (temperature).
Now, to readdress the timing question. Almost all digital circuits both latch and drive on the same clock edge. In this case, we have a clock frequency of 30MHz (a period of 33.33ns). Let's make an incorrect assumption just for the sake of argument.
Suppose the falling clock edge arrives at both the input latch (within the FPGA) and the data output latch( on the serial flash device) at the exact same instant.
The new data out signal has 33.33ns to travel through the latch, through the serial flash die, out the pad on the die, along the internal wire bond to the pin, along the pin, along the board trace, along the pin on the FPGA, along the internal wire bond from the pin to the die (assuming non-BGA), through the pad on the die, to the input of the flip-flop within the FPGA. If it arrives within this 33.33ns, the next falling clock edge will sucessfully latch the new data signal into the flip-flop.
Now, suppose we change the scheme and drive out of the serial-flash device on falling clock edge but latch the input into the FPGA on rising clock edge. Assuming the clock duty cycle is 50%, we've now cut the time available for the signal to reach its destination from 33.33ns down to 16.66ns (a 60MHz clock frequency). Now of course there are many other numbers in this calculation. This is just assuming an ideal situation.
Anyway, driving and latching on seperate clock edges is not the proper way to guarantee setup / hold times in a digital circuit.
Jake