First of all, crossing clock domain is a well-known problem and it's no longer difficult. There are canned solutions and numerous proven designs for example. I myself have done many async FIFO designs in ASICs and full-custom microprocessors and never had any issue. The reason I didn't use my own FIFO design here is that I see that dcfifo uses less resource (probably because it's better optimized using some internal "secret sauce"). Also I want to avoid some extra work with set_false_path etc in timequest. Last but not least, I hope through this discussion Altera can improve dcfifo to benefit more designers. I have been doing ASIC and full-custom IC design for 15 years, but I am very new in doing FPGA design.
As for the robustness, it is understood that flags could be "conservative". For example, rdempty may go high when the FIFO is not empty. However, the flags should never be "aggressive" (e.g. rdempty becomes 0 when the FIFO is empty) as it may cause FIFO underflow or overflow. As long as all flags are conservative, the design will be robust.
However, some flags in dcfifo are NOT conservative. For example, rdusedw has a 2-rdclk latency from rdreq, which means that the first cycle after rdreq, rdusedw will not decrease (but it can increase due to previous write). The result is that rdusedw falsely indicates that the FIFO has more items than it does -- and this can cause FIFO underflow. In fact, since rdusedw is not flopped, it would require an extra cycle to flop it if timing is critical, and this makes the problem even worse.
Can some Altera guru suggest anyway to tweak dcfifo for the issues mentioned above?
P.S. For readers who are interested in issues in async fifo as well as solutions, Clifford Cummings' SNUG02 paper "Simulation and Synthesis Techniques for Asynchronous FIFO Design" provides good summary.