My "write_ptr_synced_in_rdclk" is actually the sync'ed gray code converted back into binary, and I've used this logic in many other designs so it's guaranteed to work. The key point here is that no matter how long it takes to sync the write pointer into rdclk domain, rdusedw only has one-rdclk latency from rdreq. And more importantly, rdusedw is conservative, which means that the FIFO would never underflow if read request is made using rdusedw. For details on my read logic, you can refer to my# 6 reply above.
Why would you think rdusedw would not meet timing? write_ptr_synced_in_rdclk and write_ptr_last are both flopped, and rdreq is fresh input. Unless rdreq is late arrival, the logic should have no problem meet timing.
I tried RTL viewer but it wasn't helpful at all. I complained this in a separate thread. Basically timequest generated some meaningless names for all intermediate signals between two flops, so the timing path showed "sig~242|dataa", "sig~242|combout", "n|Mux0~1|datac", "n|Mux0~1|combout", "m|fifo|auto_generated|rdptr_g1p|countera9|combout " ... which is very difficult to trace down. Also when I tried to locate in RTL viewer, I got error "Can't find instance name xxx in current RTL schematic".
Is there a simulation model or source code of dcfifo? That can help me understand the timing. Also is it possible to pull out some internal signals (e.g. write pointer synced into rdclk domain) and use them in my RTL?
FYI, my FIFO has wrclk=133MHz and rdclk=200MHz.