I am trying to do a little calculation in hardware but I'm stuck on how to get started. I'm a complete newbie, but after reading a lot of manuals and demos, I figured out that I could have a soft processor NIOS that could accept C code. So using a basic computer configuration of the the DE2 (it comes with the University Program), I then created a NIOS BSP project. After trying many times, I managed to get everything compiled and the code below to run on the target hardware. void main(void) { int line1[] = {1, 2, 3, 4, 5, 6, 7}; int line2[] = {1, 2, 3, 4, 5, 6, 7}; //--- Do this in verilog! int i; int store_val = 0; for(i=0; i<7; i++) { store_val = store_val + (line1*line2); } //--- verilog_end } Now the next step is that I want to do the calculation in the for-loop section in hardware using verilog rather than C. I have three main queries: 1) How to write that for-loop function in verilog? 2) Where do I save the verilog module? 3) When I have the verilog equivalent, how do I move the values contained in the two C arrays to the verilog module to do the calculation and then send back 'store_val' to the BSP project? I read about DMA in other posts but I don't know how to do that process. This is confusing me. Any help is greatly appreciated. Thank you very much

i havent used a for loop in verilog before but this link shows you how to http://www.asic-world.com/verilog/verilog_one_day2.html you have to save your verilog module in the same project that you created your processor. after that you can right click on that file in the files tab and select 'create symbol for this file' after that you can use that symbol in the bdf editor just as any other module.(double click on the bdf editor and your module should be in the project folder) as for sending data from your processor to your verilog module is tricky. what you can do is add a 16bit PIO device to your processor and send data through it integer by integer. Because it is very impractical to send the whole array at once. if i understood your program right what you can do is create two 16bit PIO devices and send line1 and line2 separately two at a time. and write your verilog to process data as it comes in. This way you dont need a for loop in your verilog

Hi vivasam, 1) for loops in verilog aren't synthesizable, they are only working in simulation. You have to write a process with a clock input. On every clock cycle it will be processed one time - so you can do the mathematical operations. Maybe you need a second process to count the cycles and abort the loop. 2) the verilog module have to be saved in the quartus-project folder or a subfolder. You can use the quartusII software to create the verilog module, and modelsim to simulate it. 3) DMA is not the right way to communicate between nios and hardware. You can use FIFOs or DUALPORT-RAMs to communicate. FIFOs are easier to use. You can implement this functions bei using the MEGAWIZARD in quartusII software, both are standard modules. When you got different clocks for your nios and the verilog-module you should use clock-crossing FIFOs. As normad said you can also use the PIO (a module for sopc-builder), but be careful with the clocking.

For loops are synthesizable but may give different results that what you expect. In this case it would create 8 multipliers and 8 adders that would all work in parallel to do the operation in one clock cycle. If you want to have only one multiplier and do the operation in 8 cycles instead, then yes you need to get rid of the loop and do a clocked process.

Thank you everybody for the tips but I am now more lost than before :)! OK let's start somewhere. Please correct me where I am wrong. 1) Starting with normad's PIO suggestion, I inserted three 16-bit PIO ports (2 in and 1 out) in SOPC Builder. I called them line_1_in, line_2_in and result_out respectively. Each one has a Base and End address assigned by SOPC, e.g. 0x08200000-0x0820000f for the line_1_in. Then I declared these ports in the top level .v file. as: input LINE_1_DATA; input LINE_2_DATA; output RESULT_BACK; and added these lines also to the internal modules: .in_port_to_the_line_1_in (LINE_1_DATA), .in_port_to_the_line_2_in (LINE_2_DATA), .out_port_from_the_result_back_out (RESULT_BACK), My new queries: - What clock do I assign to these PIOs? Now they are at 50 MHz like all other components. - To go back to the C-code now, how do I tell NIOSII to send data to these PIOs? I am assuming that I need to make use of pointers to read the memory address, but then I don't know how to move to the next step. I am not too sure how to get get rid of the for-loop either. Any suggestions please? volatile int * line_1_ptr = (int *) 0x08200000; // Port_in_1 address volatile int * line_2_ptr = (int *) 0x08200010;// Port_in_2 address volatile int * result_back_ptr = (int *) 0x08200020;// Port_out_result *line_1_ptr = line1; // But line1 is an array *line_2_ptr = line2; // line2 is an array store_val = * result_back_ptr; 2) How do I write a process with a clock input as suggested by nophutwern? 3) This is my first attempt at writing a verilog module. I don't even know how to compile it :), but I just wanted to know if this the type of code that will do the calculation as the data comes in : module my_sum_prod ( // Inputs clk, line_1_in, line_2_in, // Output result_out ); //Port Declarations // Inputs input clk; input [15:0] line_1_in; input [15:0] line_2_in; // Output output [15:0] result_out //Internal registers reg [15:0] original_line_1; reg [15:0] original_line_2; reg [15:0] temp_sum; reg [15:0] final_result_out; temp_sum <= 16'd0; final_result_out < 16'd0; always @(posedge clk) begin temp_sum <= original_line_1 * original_line_2; final_result_out <= final_result_out + temp_sum; end assign result_out = final_result_out ; endmodule 4) When I manage to do a proper verilog module, where and how I do call it in my main .c file such that it accepts the data from the NIOS processor through the PIOs and return a value again? Thank you again. I'm just entering this whole FPGA world and I need step-by-step help. NB: To nophutwern: I will try the FIFO as soon as I get this methodology working. It's part of my learning process.

--- Quote Start --- DMA is not the right way to communicate between nios and hardware. --- Quote End --- The reason I talked about DMA is because I saw from a Reference Design (StratixII_DSP_Kit-v1.0.0) that the author reads an image from the flash memory card into a DMA buffer, apply an edge detector operation to the data, then sends it back to the DMA to be displayed on the VGA. Suppose my data is not just 2 arrays of 7 integers but instead 2000 arrays of 1000 integers or a 2-D array of [2000][1000] if it is an image, do I still use FIFOs or DUALPORT-RAMs to communicate with hardware? My idea was to make an SOPC component with Avalon MM slave interfaces out of the verilog code (another thing I don't know how to do!) and 'slot it' in the data flow from the CPU data master in between two DMA buffers as in that reference design mentioned above. But I guess I have to do it the PIO, FIFO,or DUALPORT way for now.

Move data from NIOS to hardware and then back to NIOS

14 Replies

Altera_Forum
Honored Contributor
14 years ago
--- Quote Start ---
You might have an easier time using this instead: http://www.alterawiki.com/wiki/modular_sgdma

--- Quote End ---

Yeah the UP core has a ST port into the core and another ST port coming out of it. Before I slot the ST component in between the read and write masters and assuming that my original data is generated by the CPU and stored in the SDRAM, and the result after going through the UP core will also go in the SDRAM, do I need:
- modular dispatcher (MM to MM), Read master and Write master like in the example file
or
- modular dispatcher (MM to ST) & Read master, and modular dispatcher (ST to MM) & Write master

My feeling is that it is the first option but I just want a confirmation because it is my first time using a DMA type component.

If suppose instead I had used the standard SG-DMA component available in SOPC, would I then need two SG-DMA components, i.e. MM to ST at transmission and ST to MM at receiver?

OK, in the meantime I tried to adapt the example given in the website to my DE2-115 board but the msgdma keeps spinning and my NIOS program gets stuck at loop 'while (sgdma_interrupt_fired == 0) {}'. How do I find the cause of the error? Could this error due to the fact that I have not fully copied the SOPC example design file because I have not used the Avalon-MM Pipeline Bridge and the 'DDR SDRAM Contorller with ALTMEMPHY'? I just have a CPU, SDRAM controller and the Modular SGDMA components (all running at same clock rate) at the moment. I did not add those components because I don't understand their purpose but if needed, I will do it.

Now assuming I get that example file working after adding all these extra components and clocks, what changes will I need to bring to the parameter setttings of my Read Master and Write Master components if the data to be transferred is now a 3 x 12 array of u8 integers, i.e. 36 bytes, defined as below in my C code.
alt_u8 my_2d_array[3][12] ; //Fill in the values
source_buffer = &(my_2d_array[0][0]);

The Data Width setting goes to 8. How about Length Width and FIFO Depth, and the other settings? What are the major changes in the example 'main.c' file to I need to do?

Thanks
Altera_Forum
Honored Contributor
14 years ago
That's correct, to do this with the SGDMA on the ACDS you would need to control a pair of SGDMAs for doing MM-->ST and ST-->MM. With the mSGDMA if you wedged your block between the read and write masters then your control would be just triggering a normal DMA transfer (assuming for every input there is one output from your block).

The pipeline bridge isn't necessary, it's just included to add some additional pipelining to the design. If you are running the included software from the mSGDMA design make sure you change the settings near the top of the main.c file to represent your own system memory address base and span. I did a minor update to that software to fix a bug which would cause the source and destination buffers to overlap so make sure you have the latest main.c file.

You probably don't need to modify many of the settings. The length width parameter just dictates the maximum number of bytes you can transfer in a single transfer. For example if you chose 20 bits that means you can transfer slightly less than 1MB of data in a single descriptor. The only reason why it's a parameter and not hardcoded to be 32-bits is that when a timing critical path shows up in one of the masters it can typically be solved by just reducing that length register width to something more sensible (being able to transfer ~4GB in one shot doesn't make sense in SOPC Builder which only has a 4GB space per master anyway).

Since you are dealing with MM to ST the symbol size is 8-bits. So if your component has an 8-bit input/output you can still setup the mSGDMA for a wider data path. This will help increase your memory throughput since multiple symbols can be fetched every clock cycle. I would use the C code as a guideline, most of it has nothing to do with the mSGDMA so it may end up leading to confusion. If you want to see a simplier application check out this design example which is configured for MM --> ST which performs frame buffering to an LCD. The only difference is you would use MM --> MM and setup the descriptors slightly differently.

http://www.alterawiki.com/wiki/modular_sgdma
Altera_Forum
Honored Contributor
14 years ago
--- Quote Start ---

If you are running the included software from the mSGDMA design make sure you change the settings near the top of the main.c file to represent your own system memory address base and span.

--- Quote End ---

OK, I found my mistake. I had the CPU Reset and Exception vectors on the SDRAM. I have now changed the DATA_SOURCE_BASE address to start further into the memory, and I can see the source and destination memory being the same after I run the program in debug mode.

Just a quick question: Is the statement ' test_counter = 0;' not supposed to be before the start of the 'do' loop? Because '(test_counter < NUMBER_OF_TESTS)' in the 'while condition' will always be true otherwise.

The next step for me is to try to populate the source buffer with the 36 data values contained in my_2d_array, but I am struggling to do this. I've set MAXIMUM_BUFFER_SIZE 36 and NUMBER_OF_BUFFERS 2 but I am not too confident with the pointer notation to get first 36 addresses after DATA_SOURCE_BASE to contain those values. Can anybody please help me with this?

--- Quote Start ---
If you want to see a simplier application check out this design example which is configured for MM --> ST which performs frame buffering to an LCD. The only difference is you would use MM --> MM and setup the descriptors slightly differently.

http://www.alterawiki.com/wiki/modular_sgdma
--- Quote End ---

Where is this design example? I can't find it on that website.

--- Quote Start ---
With the mSGDMA if you wedged your block between the read and write masters then your control would be just triggering a normal DMA transfer (assuming for every input there is one output from your block).

--- Quote End ---

Now that you mention to assume every input = output, I am realizing it is not the case with that UP IP block. This block takes in one 8-bit value at a time and uses altshift_taps shift register to get the data in the right format to do its processing. Anyway, this is not a problem for now as I just want to get those 36 values in my array moved around using Avalon ST and I will have to write my own core later.
Altera_Forum
Honored Contributor
14 years ago
Good catch, I never noticed that bug since I have always used the code in an infinite loop mode. I'll update the design some time this week to correct that.

I would probably make your own code to test your own hardware since most of the mSGDMA code is setup to setup random test buffers and all kinds of non-practical stuff. Your code should end up being a faction of the length if you code it for your own application. You could run an uncached malloc (see Nios II software handbook for more details) to allocate a pointer to some location in the heap and then dereference that point to populate the buffer.

Sorry I linked the wrong page, I meant this one: http://www.alterawiki.com/wiki/modular_sgdma_video_frame_buffer

If the read and write lengths are not the same then you'll need to use a pair of DMAs since a DMA typically performs the same number of reads as writes. It would be possible to hack the dispatcher HDL to support different read and write lengths but I wouldn't recommend attempting that just yet.

Forum Discussion

Move data from NIOS to hardware and then back to NIOS

14 Replies

Recent Discussions

Questa Starter license missing FEATURE/INCREMENT lines”

Issue with configuring EPCQ64A & Cyclone10LP using NiosV as processor.

Error with PDN Tool 2.0 for Cyclone V

Trouble Getting started with Stratix 10 SOC

JTAG Chain Broken on Agilex 7-I Dev Kit