Hello, As I have described in a previous post http://www.alteraforum.com/forum/showthread.php?t=52808&p=217155&highlight=#post217155 I want to implement a DAQ system using NIOS and W5300. The communication problem between NIOS and W5300 has been solved and I am able now to transfer data through TCP/IP. Now I am dealing with a different problem that has to do with transfer data between hardware module (implemented in verilog) and NIOS. To give you a short description my system consists of 36 hardware modules each of them transferring 64 bits of data every 462us. The goal is to grab this data and send them through Ethernet to a pc, so my required data rate is about 5 Mbits/s. Ethernet communication itself doesn't seem to be a problem cause can achieve much higher speeds. The problem is how fast the cpu grabs the data for the modules. In the beginning I thought using just PIOs wouldn't be a problem. So NIOS collects the data from the modules and then transfer them again to W5300 fifo, but using counters I found that my maximum data rate is only about 4Mbit/s. Is there any alternative way to go? I thought using DMA but it sounds bit complicated for me at the moment and I don't know if at the end would be enough. Any suggestions??

4Mbit/s sounds kind of low, even for a simplistic implementation. Did you already isolate where the bottleneck is? i.e. how fast can you run the W5300 in isolation, how fast can you grab your data in isolation. Assuming your data acquisition is the bottleneck: how does your data acquisition portion work? Is the NIOS writing to the PIO to force the trigger, and then following that up with a read for the data? Anything where you are performing multiple PIO interactions to execute a single logical operation is an area where it should be very straightforward to improve your performance just by creating your own Avalon-MM Slave component.

I haven't performed any isolation measurement yet but I am pretty sure the problem is on data grabbing. The W5300 data transfer is in range of 100 Mbit/s so its more than enough for my application. The code itself is quite straightforward and it looks like that --- Quote Start --- while (1) { data_buf2[0]=IORD (0x214f0,0); data_buf2[1]=IORD (0x214e0,0); data_buf2[2]=IORD (0x214d0,0); data_buf2[3]=IORD (0x214c0,0); data_buf2[4]=IORD (0x214b0,0); data_buf2[5]=IORD (0x214a0,0); data_buf2[6]=IORD (0x21490,0); data_buf2[7]=IORD (0x21480,0); data_buf2[8]=IORD(0x21470,0); data_buf2[9]=IORD(0x21460,0); data_buf2[10]=IORD(0x21450,0); data_buf2[11]=IORD(0x21440,0); data_buf2[12]=IORD(0x21430,0); data_buf2[13]=IORD(0x21420,0); data_buf2[14]=IORD(0x21410,0); data_buf2[15]=IORD(0x21400,0); test_tcps(0, 5000, data_buf2, 0); } --- Quote End --- At the moment I am using just 16 PIO so 256 bits, the input is just a counter running at 18Khz. Ive seen some improvement using NIOSII/f and now my data rate is 4,6 Mbit/s. The modules are running independent as I wrote before so the cpu has only to read a register value connected to the PIO, that's all. I don't mind receiving the same data twice, but this requires reading faster than intended.

W5300 may be capable of 100Mbit/s, but is your system (NIOS+RAM+software+W5300) capable of that performance level? From your code snippet, I'm guessing the answer is no: running your IORD()'s should be a handful of clocks each (and I'm assuming they are in the same [fast] clock domain as the NIOS), so I am guessing all your time is being consumed within test_tcps(). Because you only need a modest improvement in performance over what you've got working already, I would suggest briefly reviewing https://www.altera.com/en_us/pdfs/literature/an/an391.pdf and picking one method you are comfortable with to find the biggest bottleneck in your test_tcps(), then just make whatever easy change to improve that bottleneck. To answer your original post, DMA really wouldn't buy you much in the above code snippet, as I think all your time is spent inside test_tcps() which DMA wouldn't help one bit.

Thank you very much, actually you are right. I used timestamp timer and the total time for PIO reads was about 4us. So the problem is withe the cpu-w5300 communication. I used the drivers provided by Wiznet and the code isn't so complicated. On the other hand I am using generic tristate controller in QSYS to connect W5300 to Nios, so there is maybe a problem withe the configuration of tristate controller. Attached is the timing I used (I am running with a 140 Mhz clk). Do you have any experience with this? https://www.alteraforum.com/forum/attachment.php?attachmentid=12555

The forum shrunk your picture, so it's not very readable. I did notice that your specifying your timing in terms of "cycles", meaning your 140MHz clock, and that your "Read wait time" was double digit [14? 24?]. The W5300 datasheet says tRD = 65ns so 9.1 cycles [round up to 10]. And turnaround time is even larger? (shouldn't be?) This is just one example, but I personally find it more intuitive to use "nanoseconds" and then just plugin the values from the datasheet table just as you read them, and let Qsys handle the rounding. After you've done all that, will you see a dramatic increase? It's hard to say without knowing what percentage of your time is actually spent reading, writing to the external device. Other sources of bottleneck could be basic things like running with no cache from a slow memory, for example. It could also be something like you're taking source code for another environment, and then missed some porting detail like how time is managed if there is any sort of internal delay in your ported code. Anyway, I would drill down one more level into the W5300 code and profile the time used by the register read/write primitives to identify if your problem is there or not. If a big percentage of your time is with that I/O, yes focus on the tri-state bridge configuration.

Data transfer from hardware Modules to Nios

15 Replies

Altera_Forum

Honored Contributor

9 years ago

In your project settings, change the optimization level from "None" to -O3. Your software might break, or it might be fine.

Your code boils down to:


void IINCHIP_WRITE(uint32 addr, uint16 data) {
	(*((vuint16*) addr)) = data;
}
uint32 wiz_write_buf(SOCKET s, uint8* buf, uint32 len) {
	uint32 idx = 0;
	// M_08082008
	IINCHIP_CRITICAL_SECTION_ENTER();
	for (idx = 0; idx < len; idx += 2)
		IINCHIP_WRITE(Sn_TX_FIFOR(s), *((uint16*) (buf + idx)));
	// M_08082008
	IINCHIP_CRITICAL_SECTION_EXIT();
}

The compiler can do some improvements by itself when you have enabled the optimizer, but as an example, changing to a macro and using the 'register' keyword probably will bring some improvement:


#define IINCHIP_WRITE(addr, data)   ((*((vuint16*) addr)) = data)
uint32 wiz_write_buf(SOCKET s, uint8* buf, uint32 len) {
	register uint32 idx = 0;
        register uint32 fifo_addr = Sn_TX_FIFOR(s);
	// M_08082008
	IINCHIP_CRITICAL_SECTION_ENTER();
	for (idx = 0; idx < len; idx += 2)
		IINCHIP_WRITE(fifo_addr, *((uint16*) (buf + idx)));
	// M_08082008
	IINCHIP_CRITICAL_SECTION_EXIT();
}

The simple above edits eliminated 2 x 16 = 32 function call overheads from your loop. The math in the loop also isn't great, but deal with the function calls first.

Altera_Forum
Honored Contributor
9 years ago
I've made code optimization but this didn't bring any improvement. The changes in the code above also didn't work in contrary bring more trouble cause by changing the code I stared to lose some packets (I've tested the changes separately....).
Altera_Forum
Honored Contributor
9 years ago
--- Quote Start ---
I've made code optimization but this didn't bring any improvement. The changes in the code above also didn't work in contrary bring more trouble cause by changing the code I stared to lose some packets (I've tested the changes separately....).
--- Quote End ---

"losing packets" could simply be a side effect of your code not waiting for the transmit to complete before issuing the next request.
After you've sped things up as fast as you can, just add a simple wait loop to delay between packet transmits and see if that affects at all your lost packets.
Altera_Forum
Honored Contributor
9 years ago
No its just that the code runs slower and it doesn't return fast enough to catch the next data. I've added a delay and the behavior is the same.
Altera_Forum
Honored Contributor
9 years ago
--- Quote Start ---
No its just that the code runs slower and it doesn't return fast enough to catch the next data. I've added a delay and the behavior is the same.
--- Quote End ---

Sorry, I don't understand the problem: each of those code improvements should have resulted in faster execution time for wiz_write_buf().

If they somehow made it operate slower, I guess make changes one at a time and figure out why it got slower?

Forum Discussion

Data transfer from hardware Modules to Nios

15 Replies

Recent Discussions

NIOS V/m dbg_reset_out signal (Q25.1 Std, MAX10)

NiosV and juart-terminal

licensing.altera.com never worked

JTAG_UART stuck in printf

Ashling IDE scripted project creation