Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
13 years ago

Coding style to minimize combinational path delay?

I'm tuning my logic to meet timing. I have a long continuous assignment statement that outputs a signal based on the current state in my FSM. The below implementation results in a long mux chain. Is there an alternative coding style that results in smaller path delay?

	assign tx_tlp_dword_offset = 
		(curstate == HANDLE_BAR1_READ_1_STATE) ? 7'h4 : 
		(curstate == HANDLE_BAR1_READ_2_STATE) ? 7'h0 : 
		(curstate == HANDLE_BAR1_READ_3_STATE) ? 7'h1 : 
		(curstate == HANDLE_BAR1_READ_4_STATE) ? 7'h2 : 
		(curstate == HANDLE_BAR1_READ_5_STATE) ? 7'h5 : 
		(curstate == HANDLE_BAR1_READ_6_STATE) ? 7'h5 : 
		(curstate == H2D_DMA_INIT_MEMRD_DW0_STATE) ? 7'h0 : 
		(curstate == H2D_DMA_INIT_MEMRD_DW1_STATE) ? 7'h1 : 
		(curstate == H2D_DMA_INIT_MEMRD_DW2_STATE) ? 7'h2 : 
		(curstate == H2D_DMA_INIT_MEMRD_DW3_STATE) ? 7'h3 : 
		(curstate == H2D_DMA_SEND_MEMRD_TLP_STATE) ? 7'h3 : 
		(curstate == H2D_DMA_SEND_MEMRD_TLP2_STATE) ? 7'h3 : 
		(curstate == D2H_DMA_INIT_MEMWR_DW0_STATE) ? 7'h0 : 
		(curstate == D2H_DMA_INIT_MEMWR_DW1_STATE) ? 7'h1 : 
		(curstate == D2H_DMA_INIT_MEMWR_DW2_STATE) ? 7'h2 : 
		(curstate == D2H_DMA_INIT_MEMWR_PL_STATE) ? reg_tx_tlp_dword_offset : 
		(curstate == D2H_DMA_SEND_MEMWR_TLP_STATE) ? reg_tx_tlp_dword_offset : 
		(curstate == D2H_DMA_SEND_MEMWR_TLP2_STATE) ? reg_tx_tlp_dword_offset : 
		7'h0;

I have attached the TimeQuest path information as well as the output from the RTL viewer.

20 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    ...would get into a habit of just using case statements when you don't need priority...

    --- Quote End ---

    I just want to point out, in contrast to multiple comments in this thread, that case statements are inherently supposed to have priority. They are not supposed to execute all branches in parallel. Don't take my word for it, though; take it from someone who trains Verilog professionally:

    http://sutherland-hdl.com/online_verilog_ref_guide/vlog_ref_top.html

    "Compares the net, register or literal value to each case and executes the statement or statement group associated with the first matching case."

    I have found this to be a point of contention across different tools. If you want to make sure that the branches of a case statement execute in parallel, look into the "unique" key word from SystemVerilog.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    I just want to point out, in contrast to multiple comments in this thread, that case statements are inherently supposed to have priority. They are not supposed to execute all branches in parallel. Don't take my word for it, though; take it from someone who trains Verilog professionally:

    http://sutherland-hdl.com/online_verilog_ref_guide/vlog_ref_top.html

    "Compares the net, register or literal value to each case and executes the statement or statement group associated with the first matching case."

    I have found this to be a point of contention across different tools. If you want to make sure that the branches of a case statement execute in parallel, look into the "unique" key word from SystemVerilog.

    --- Quote End ---

    You are absolutely correct, that was a bit of a blanket statement on my part. I have never coded using a case statement that could potentially have priority like you mentioned so it slipped my mind.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    I just want to point out, in contrast to multiple comments in this thread, that case statements are inherently supposed to have priority. They are not supposed to execute all branches in parallel.

    --- Quote End ---

    Yes, there are incorrect assumptions about evaluation of case constructs. My discussion point was, that there's no room for priority in the present code, thus "if..then..else" chain and case construct should be expected to end up in the same gate level code anyway.

    It should be added, that "if then else" and regular case construct (no parallel case) are evaluating the code in the same way.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I wonder if grouping the select lines helps since it reduces the number of inputs...

    (I omitted the specific conditions that chose 7'h0 as an input since that will be taken care of by the default condition)

    assign tx_tlp_dword_offset = 
    		(curstate == HANDLE_BAR1_READ_1_STATE) ? 7'h4 : 
    		(curstate == HANDLE_BAR1_READ_3_STATE || 
                     curstate == H2D_DMA_INIT_MEMRD_DW1_STATE || 
                     curstate == D2H_DMA_INIT_MEMWR_DW1_STATE) ? 7'h1 : 
    		(curstate == HANDLE_BAR1_READ_4_STATE || 
                     curstate == H2D_DMA_INIT_MEMRD_DW2_STATE ||
                     curstate == D2H_DMA_INIT_MEMWR_DW2_STATE) ? 7'h2 : 
    		(curstate == HANDLE_BAR1_READ_5_STATE || 
    		(curstate == HANDLE_BAR1_READ_6_STATE) ? 7'h5 :
    		(curstate == H2D_DMA_INIT_MEMRD_DW3_STATE ||
    		(curstate == H2D_DMA_SEND_MEMRD_TLP_STATE ||
    		(curstate == H2D_DMA_SEND_MEMRD_TLP2_STATE) ? 7'h3 : 
    		(curstate == D2H_DMA_INIT_MEMWR_PL_STATE) ? reg_tx_tlp_dword_offset : 
    		(curstate == D2H_DMA_SEND_MEMWR_TLP_STATE) ? reg_tx_tlp_dword_offset : 
    		(curstate == D2H_DMA_SEND_MEMWR_TLP2_STATE) ? reg_tx_tlp_dword_offset : 
    		7'h0;
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    I wonder if grouping the select lines helps since it reduces the number of inputs...

    --- Quote End ---

    The expression for each bit of tx_tlp_dword_offset will undergo logic minimization during synthesis, thus I won't expect an effect of reordering or grouping on logic element usage.

    Different ways of state encoding matter, in contrast.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks all for your suggestions. I ended up cutting my clock frequency in half, giving me much more head room. The Cyclone IV is simply too slow (i.e. too much combinational path delays). In case I can't meet my data throughput target (border-line now) then I will have to double the width of the critical data path in my design.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Thanks all for your suggestions. I ended up cutting my clock frequency in half, giving me much more head room. The Cyclone IV is simply too slow (i.e. too much combinational path delays). In case I can't meet my data throughput target (border-line now) then I will have to double the width of the critical data path in my design.

    --- Quote End ---

    The Cyclone IV isn't that slow. I have a design with 200 MHz and 150 MHz (among others) clock frequencies in a EP4CE40F23C7N device. I have similar muxes in the 150 MHz domain (switching constants for multipliers). Proper pipelining is key - divide and conquer!
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Unfortunately, it is too slow for running my design at 125 MHz. Perhaps not Cyclone IV specific. Anything can be pipelined but it will fragment an otherwise straight-forward design into a jumble of flip-flops impossible to understand. Pipelining is suitable for some designs where it makes architechural sense but, unfortunately, it did not make sense in my design - lowering the clock frequency to 62.5 MHz and possibly widening certain data buses from 8 to 16 (or 32) bits made more sense in my case.

    By the way; How does a Cyclone IV compare to, say, an Arria II when it comes to combinational path delays across the device? Is the Arria II 'faster' and, if so, why?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Arria II is built using the 'High Speed' process just like the Stratix used. However, Arria V is using the 'Low power' process like Cyclone uses. Yes Arria II will run faster than a Cyclone IV