I know there is tradeoff between latency and cycle time. All I want is to make the latency from rdreq to rdusedw one clock cycle instead of 3 (since rdusedw is not flopped you need an extra cycle). This is very feasible, because you just need to do something like this:
wire [] write_delta = write_ptr_synced_in_rdclk - write_ptr_last;
always @(posedge rdclk)
begin
write_ptr_last <= write_ptr_synced_in_rdclk;
rdusedw <= rdusedw + write_delta - rdreq;
end
I omitted reset logic but that's simple.
I have read the user guide in your link (as pointed out in my original post), but the document doesn't mention anying about which ports are flopped in or flopped out. In order to meet timing in high-speed design, you need to add one more cycle to the latencies listed in Table 3.
Your suggestion is exactly what I had to do to work around the long latency issue, but the actually design is much more complicated because the long latency makes it impossible to pipeline (such that FIFO can be read every cycle when possible). Therefore I had to implement another small FIFO to do the look ahead logic. Had dcfifo implemented rdusedw with 1-rdclk latency as I suggested, it would make my life as well as many other engineers' life much easier. I hope Atera would consider improve dcfifo in future releases.