Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
19 years ago

Need ideas on solving a weird LAN91C111 problem

I have a dual processor design with two /f cores. One processor, cpu_fifo, pulls data out of a fifo that is filled by a VHDL block receiving data from some proto pins. It copies the data into a shared dual-port memory built from M4Ks. This shared memory has 1500-byte prebuilt UDP packets -- cpu_fifo copies data into the data area of these packets. Both CPUs are connected to this memory via a tightly-coupled memory port.

The other processor, cpu_ethernet, assigns pbuf payloads to addresses in this shared memory and sends the packets out the ethernet port. Everything about this design works perfectly, except I'm running into weird performance issues.

At 50 MHz, it takes about 6,400 clocks to send a packet. At 100MHz, it alternates between about 9,000 clocks and 21,000 clocks to do the same task.

The net effect is that throughput is slightly lower at 100Mhz than at 50Mhz. I don't even know where to begin in analyzing what's causing this.

4 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    originally posted by bkucera@Jun 19 2006, 01:17 PM

    i have a dual processor design with two /f cores. one processor, cpu_fifo, pulls data out of a fifo that is filled by a vhdl block receiving data from some proto pins. it copies the data into a shared dual-port memory built from m4ks. this shared memory has 1500-byte prebuilt udp packets -- cpu_fifo copies data into the data area of these packets. both cpus are connected to this memory via a tightly-coupled memory port.

    the other processor, cpu_ethernet, assigns pbuf payloads to addresses in this shared memory and sends the packets out the ethernet port. everything about this design works perfectly, except i'm running into weird performance issues.

    at 50 mhz, it takes about 6,400 clocks to send a packet. at 100mhz, it alternates[/b] between about 9,000 clocks and 21,000 clocks to do the same task.

    The net effect is that throughput is slightly lower at 100Mhz than at 50Mhz. I don't even know where to begin in analyzing what's causing this.

    <div align='right'><{post_snapback}> (index.php?act=findpost&pid=16255)

    --- Quote End ---

    [/b]

    --- Quote End ---

    I can&#39;t explain the alternating behavior you are seeing; however, the clock-cycle count being proportionally higher is to be expected. That is to say, increasing the clock-speed should help you with software/packet construction tasks, but the interface to the ethernet chip will not improve.

    Why? The 91c111 is a pretty slow chip; a while ago we had to dial-down the performance to it using Avalon wait-states (can be seen in the components/altera_avalon_lan91c111/class.ptf file) to ensure that proper timing to the chip was met in all circumstances, including DMA transfers. Since this time-delay is fixed, the generated SOPC Builder logic will insert additional delay cycles to ensure that timing to the chip is met regardless of input clock-frequency.

    I&#39;d suggest some profiling - gprof to start - between 50/100 systems to see where the discrepency is with these occasional long-latency packet transfers.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    originally posted by jesse@Jun 19 2006, 03:32 PM

    i can&#39;t explain the alternating behavior you are seeing; however, the clock-cycle count being proportionally higher is to be expected. that is to say, increasing the clock-speed should help you with software/packet construction tasks, but the interface to the ethernet chip will not improve.

    why? the 91c111 is a pretty slow chip; a while ago we had to dial-down the performance to it using avalon wait-states (can be seen in the components/altera_avalon_lan91c111/class.ptf file) to ensure that proper timing to the chip was met in all circumstances, including dma transfers. since this time-delay is fixed, the generated sopc builder logic will insert additional delay cycles to ensure that timing to the chip is met regardless of input clock-frequency.

    i&#39;d suggest some profiling - gprof to start - between 50/100 systems to see where the discrepency is with these occasional long-latency packet transfers.

    <div align='right'><{post_snapback}> (index.php?act=findpost&pid=16256)

    --- quote end ---

    --- Quote End ---

    I&#39;ve only used the performance counter so far, I&#39;ll look into gprof to see what I can figure out with that.

    This is what my LAN91C111 class.ptf file says:

    Read_Wait_States = "20ns";

    Write_Wait_States = "20ns";

    Setup_Time = "20ns";

    Hold_Time = "20ns";

    Is that what it should be? 20ns doesn&#39;t seem that extreme.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Are you crossing clock domains? Running one component at 50MHz and another at 100MHz? If so this is going to kill your performance because the SOPC builder has to instantiate clock synchronization logic. It takes several clock cycles to pass data through the clock arbitrators.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    originally posted by jakobjones@Jun 20 2006, 10:08 PM

    are you crossing clock domains? running one component at 50mhz and another at 100mhz? if so this is going to kill your performance because the sopc builder has to instantiate clock synchronization logic. it takes several clock cycles to pass data through the clock arbitrators.

    <div align='right'><{post_snapback}> (index.php?act=findpost&pid=16299)

    --- quote end ---

    --- Quote End ---

    No I&#39;m running everything off the same clock. There is a 33Mhz clock coming from off-board but that&#39;s only used in some VHDL blocks, SOPC builder and the Nios processor never see that signal.