Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
15 years ago

How to call tightly couple module during C code for NIOS II

Hi there... I am a new to NIOS II technology and I would like to ask... how can we call for module in C programming for NIOS II. Because from my understanding, we can add some module that coupled with the NIOS II so that some of the function can be taking care by hardware (correct me if wrong:)) . However, I do not really understand how we can do it? help please............ thanks.......

17 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Here's an example, as promised.

    In my top level .v file, I have the following:

    // Instantiate Nios II/e CPU
      cpu_accel CPU0 (
              .clk_0(CLOCK_CPU),
              .in_port_to_the_data_in0(int_datain),
              .out_port_from_the_data_out0(int_dataout0),
              .out_port_from_the_data_out1(int_dataout1),
              .reset_n(KEY),
        );
        // Instantiate custom accelerated verilog component
        FIR FIR0 (
           .PIO_in0(int_dataout0),
           .PIO_in1(int_dataout1),
            .PIO_out0(int_datain0)
        );
    "cpu_accel.v" is my generated SOPC builder system. "FIR" is my custom Verilog module. 'int_datain' and 'int_dataout' are simply defined as wires.

    You can use IO_RD and IO_WR. In this case, I simply used pointers:

    
    <snip>
    int main(int argc, char* argv) {
        
        // PIO pointers
        int* data_out0 = (int *) DATA_OUT0_BASE;
        int* data_out1 = (int *) DATA_OUT1_BASE;
        volatile int* data_in0 = (int *) DATA_IN0_BASE;
        int res;
    <snip>
            *(data_out1) = 1;
            *(data_out0) = 2;
    // insert wait statement here, if necessary
            res = *data_in0;                        // copy result to local int
    <snip>
    
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thank you gaudetteje for the sample code. Looking at your code, I realized I made a mistake with the direction of the inpt/ouput ports. But I am still facing problems compiling. So let's start fault finding.

    - I have 3 PIOs in SOPC Builder. 2 out (data_out_0 and data_out_1) and 1 in (result_in0)

    - Then in my main.v file port declaration, I put

    
    input result_data;
    output int_data_line0;
    output int_data_line1;
    

    - In the SOPC generated NIOS system, I put

    
    nios_system NiosII (
    // my pios
     .in_port_to_the_result_in0    (result_data),
      .out_port_from_the_data_out0  (int_data_line0),
      .out_port_from_the_data_out1  (int_data_line1),
    );
    

    -Finally I instantiate my custom verilog module

    
    add_two two_vals(
          .clk(system_clock),
         .line_1_in (int_data_line0),
         .line_2_in (int_data_line1),
         .result_back_out(result_data),
         );
    

    When I compile this, I get Error : Net "result_data",which fans out to "nios_system:NiosII|in_port_to_the_result_in0[0]", cannot be assigned to more than one value.

    Could this error be due to a badly written add_two.v module? I still have the module as before except I add tried to add a clock to it :

    
    module add_two (
    // Inputs
    clk,
    line_1_in,
    line_2_in,
    // Output
    result_back_out
    );
     
    //Port Declarations
    // Inputs
    input clk;
    input   line_1_in; // 8 bit value 
    input   line_2_in;
    // Output
    output wire  result_back_out; // assume 8 bit
     
    reg  original_line_1;
    reg  original_line_2;
    reg   temp_sum;
     
    always@ (posedge clk)
     begin 
      original_line_1 <= line_1_in;
      original_line_2 <= line_2_in;
      temp_sum <= original_line_1 + original_line_2;
     end
    assign result_back_out = temp_sum ;
    endmodule
    

    I also saw a warning message in SOPC builder for that PIO In when I generated the system which said: 'PIO Inputs are not hardwired in test bench. Undefined values will be read from PIO inputs during simulation.'. Am I doing something wrong there too?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Think of the verilog you're writing as a bunch of wires. You don't need to give your main.v module access to the PIO port wires. There should only be 1 driver on a line or the result is undefined. If you want to toggle your LEDs or use buttons on an eval board, you can create an assign statement or you could control the LEDs directly through a separate PIO in the Nios. What you're trying to do is drive 'result_data' from the Nios AND from an I/O pin on your board somewhere. Similarly, 'int_dataout_0/1' is a wire and can't be defined as both a wire and IO port. In your port declaration of main.v, you need to come up with a different name (i.e. output [7:0] LED_RED; but double check my syntax). Use 'assign LED_RED <= int_dataout_0' if that's what you're trying to do.

    For now, remove the port declarations and just program the Nios to printf the result to your debug console. That's the easiest way to see if the numbers are correct.

    Also, for a simple addition you don't need synchronous logic - get rid of the clock unless you want it pipelined. In that case, you should really be using a different clock phase (with a PLL) than your SOPC system to ensure data is valid.

    You can ignore the testbench warnings until you start simulating your Nios processor with Modelsim or another HDL simulator.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thank you so much gaudetteje! I don't know how to thank you enough.. if only all Altera tutorials were like the way you explained things.

    So yes, I've corrected my mistake concerning PIOs and declared internal wires instead, and now I can move two data values into the verilog module and get the output back.. finally!

    The next stage for me is to do this calculation on two arrays of values. As you mentioned before, easy way is to have the iteration in NIOS and send the data one at a time. This should not take too long.

    But as for using custom instruction or Avalon slave component to take advantage of parallelism, I will certainly come back to you for expert advice :) Any guidance is greatly appreciated.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    OK, so now I have 3 arrays of data, each containing 100 integers defined in NIOS C code as:

    
    alt_u8 line1 = {1, 2, 3, ..., 100};
    alt_u8 line2 = {10, 10, 10, ..., 10};
    alt_u8 line3 = {1, 2, 3, ..., 100};
    

    The next step is I want to apply a calculation to these arrays. Imagine these arrays are 3 rows of pixel values and I want to do a Sobel operation on them. I found the following code on the web to do this job:

    
     
    module sobel_mine( p0, p1, p2, p3, p5, p6, p7, p8, out);
    input   p0,p1,p2,p3,p5,p6,p7,p8; // 8 bit pixels inputs 
    output  out;     // 8 bit output pixel 
     
    wire signed  gx,gy;       
    wire signed  abs_gx,abs_gy; 
    wire  sum;   
     
    assign gx=((p2-p0)+((p5-p3)<<1)+(p8-p6));//sobel mask for gradient in horiz. direction 
    assign gy=((p0-p6)+((p1-p7)<<1)+(p2-p8));//sobel mask for gradient in vertical direction 
     
    assign abs_gx = (gx? ~gx+1 : gx); // to find the absolute value of gx. 
    assign abs_gy = (gy? ~gy+1 : gy); // to find the absolute value of gy. 
     
    assign sum = (abs_gx+abs_gy);    // finding the sum 
    assign out = (|sum)?8'hff : sum; // to limit the max value to 255  
     
    endmodule
     
    

    So to interface this verilog code with NIOSII, I create 8 ouput PIOs of size 8 bits, and declare them in the C code as:

    
    volatile int* data_out_0_ptr = (int *) 0x08208010; // Data_out_0 address
    volatile int* data_out_1_ptr = (int *) 0x08208020;// Data_out_1 address
    .
    .
    .
    volatile int* data_out_8_ptr = (int *) 0x08208080;// Data_out_8 address
     
    

    Then I do a for-loop to access each element and send it to the verilog module for the calculation

    
    for( i=0; i<98; i++)
    {
     *(data_out_0_ptr)= line1;
     *(data_out_1_ptr)= line1;
     *(data_out_2_ptr)= line1;
     *(data_out_3_ptr)= line2;
     *(data_out_5_ptr)= line2;
     *(data_out_6_ptr)= line3;
     *(data_out_7_ptr)= line3;
     *(data_out_8_ptr)= line3;
     sum_val = *result_back_ptr ;
      //printf("sum_val =  %d\n", sum_val);
    }
    

    This code as it is works fine but I am sure that I am not taking FPGA's advantage. My new queries are:

    1) Instead of having 8 ouput PIOs of 8 bits each, can I have 2 ouput PIO of 32 bits? If yes, how do I modify the C code to reference the right address? For example, suppose I have my 32-bit PIO data_out_32bit_ptr at address 0x08208090. In the C code for-loop, I am not sure how to reference the correct data. Is something like below correct?

    
    *(data_out_32bit_ptr)= line1;
    *(data_out_32bit_ptr + 0x8)= line1;
    *(data_out_32bit_ptr + 0x10)= line1;
    *(data_out_32bit_ptr + 0x18)= line2;
    

    But I see the End address as being 0x0820809f in SOPC Builder when I include such a 32-bit ouput PIO.

    2) I suppose I can make the most of FPGA parallelism by getting rid of this for-loop. But how do I send the data then?

    3) Also, the results contain 98 values which I can print on the console window. But I need to store the values. I tried the altera_hostfs and I managed to send the data to a text file on my computer (after 3 days of fighting ;)!) But this works only when I choose Debug As -> NIOS II Hardware and it seems to run slower than when I choose Run As -> NIOS II Hardware. What is the best way to get those values?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    1) Instead of having 8 ouput PIOs of 8 bits each, can I have 2 ouput PIO of 32 bits? If yes, how do I modify the C code to reference the right address? For example, suppose I have my 32-bit PIO data_out_32bit_ptr at address 0x08208090. In the C code for-loop, I am not sure how to reference the correct data. Is something like below correct?

    
    *(data_out_32bit_ptr)= line1;
    *(data_out_32bit_ptr + 0x8)= line1;
    *(data_out_32bit_ptr + 0x10)= line1;
    *(data_out_32bit_ptr + 0x18)= line2;
    
    But I see the End address as being 0x0820809f in SOPC Builder when I include such a 32-bit ouput PIO.

    --- Quote End ---

    Yes, you can have 2 32-bit PIOs. I can't answer your C coding question, but it sounds like you'll need some type casting. If your data is stored as sequential bytes, then accessing line1[i] as a 32-bit element should return a 32-bit number. You can check this with a printf("%x").

    --- Quote Start ---

    2) I suppose I can make the most of FPGA parallelism by getting rid of this for-loop. But how do I send the data then?

    --- Quote End ---

    Before getting into this, you should ask the question "do you NEED to?" If you're system is operating in real-time with enough headroom for anything else required then your job is done. Honestly, though, if using PIOs on a 50-300MHz processor is sufficient, then you don't need an FPGA. It could probably be done on a PC-104 stack or other single-board uC. You could also take advantage of floating point ops on a Power PC or PDSP without much difficulty.

    If the answer is yes, then you have options. Refer to my original response. Since you have 2 32-bit inputs and 1 8-bit output to your module, this would be a good candidate of a custom instruction and you'd save 2 of the 3 cycles required for PIOs. But to utilize the FPGA resources and gain serious speedup, you would add a wrapper module that replicates the Sobel module. The Nios in this case would probably be doing some DMA transfer or providing the Avalon-MM address to a memory location (if you give your module an Avalon-MM master & slave port and connect it in SOPC builder). The wrapper module would retrieve a large block of pixels and perform the Sobel operation N times.

    For a simple example, the wrapper module gathers a 4x4 pixel grid. With this data, you could instantiate 4 Sobel submodules and return 4 resulting pixels. The wrapper simply maps the correct pixels to the corresponding submodule(s). It would also be responsible for handshaking with other Avalon components and the Avalon-MM clock interface. Note that your submodules need not be clocked, only that you latch the data when it's guaranteed to be available. There are several examples of creating Avalon-MM components on Altera's website.

    --- Quote Start ---

    3) Also, the results contain 98 values which I can print on the console window. But I need to store the values. I tried the altera_hostfs and I managed to send the data to a text file on my computer (after 3 days of fighting ;)!) But this works only when I choose Debug As -> NIOS II Hardware and it seems to run slower than when I choose Run As -> NIOS II Hardware. What is the best way to get those values?

    --- Quote End ---

    I haven't used this method, so I can't comment on why Run is slower than Debug. Debug allows you to step through the Nios instructions with breakpoints. If Run mode isn't working then there's probably a timing issue.

    Storing data is a problem in and of itself. Why not just copy/paste from the console after your image is complete? Does it need to operate untethered from your PC?

    I frequently copy from the console and paste into MATLAB/Octave to verify the results. For something more automated, or if you're using this system iteratively, you'll need to store results in non-volatile memory like onboard flash or an SD card. Sending over USB to a harddrive works too, but is more complicated. What components are available on your eval board? Better yet, what eval board are you using?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks for the time invested in explaining all this to me. I understand overall what you have explained but this has also created new queries because I could not succeed in implementing your suggestions. So to start, let's break it down again:

    --- Quote Start ---

    Yes, you can have 2 32-bit PIOs.

    --- Quote End ---

    OK, I managed to do that based on another forum post .

    --- Quote Start ---

    Before getting into this, you should ask the question "do you NEED to?"

    --- Quote End ---

    Yes, I need to make it work on FPGA using its parallelism resources. I understand your reasons for not using it this way, but my task is more educational than real-world. I am just stepping into FPGA world and this is my way to learn.

    --- Quote Start ---

    But to utilize the FPGA resources and gain serious speedup, you would add a wrapper module that replicates the Sobel module. The Nios in this case would probably be doing some DMA transfer or providing the Avalon-MM address to a memory location

    --- Quote End ---

    My other attempt was to discard the PIOs and instead create an SOPC component with Avalon-MM. But I don't know how to create a wrapper module to instantiate more Sobel submodules. I've downloaded the Avalon Memory-Mapped Slave Template from Altera website, inserted as a component in SOPC and renamed as my_slave_component. However, I don't know what to put as the Register File properties (Word Size and Synchronization) and capabilities (I have enabled only two registers - one input and one output. Is that good?).

    When I looked at the my_slave_component.v to modify it and add my custom logic to it, I saw many new input/output ports.

    
    input  wire        clk,              //       clock_reset.clk
    input  wire        reset,            // clock_reset_reset.reset
    input  wire   slave_address,    //                s0.address
    input  wire        slave_read,       //                  .read
    input  wire        slave_write,      //                  .write
    output wire  slave_readdata,   //                  .readdata
    input  wire  slave_writedata,  //                  .writedata
    input  wire   slave_byteenable, //                  .byteenable
    output wire  user_dataout_0,   //    user_interface.export
    input  wire  user_datain_1   
    

    Do I need to assign values to these ports or should I worry about my user_dataout_0 and user_datain_1 only? And where exactly within the my_slave_component.v code do I paste my code? There is a section called slave_template within the code. Do I put it before or after slave_template, or does it not matter?

    Do I also need a DMA Controller component? If yes, here again I have problems with the parameters :(

    --- Quote Start ---

    Storing data is a problem in and of itself. Why not just copy/paste from the console after your image is complete? Does it need to operate untethered from your PC?

    --- Quote End ---

    No, it can be tethered to the PC for now. I will sound even more stupid now but I can't copy-paste the results! I can highlight it all, but copy or Ctrl-c would not work. Do I have to enable something in Eclipse before? I am using Quartus 10.1 with NIOS SBT.

    --- Quote Start ---

    Better yet, what eval board are you using?

    --- Quote End ---

    I am using DE2-115 which has SD-card interface. I will try to read and store data to this when I get the current problems out of the way and after I complete the applying of multiple Sobels to a small 2-D array.

    Thank you for any advice about my new queries. It's quite tough to learn this technology on my own as I am doing.