Forum Discussion
Altera_Forum
Honored Contributor
18 years ago --- Quote Start --- in assmbly my main program would look like this : begin : sjmp begin; nothing more real world than that. Anyway. before we go off on tangents here is what the system mechanism has to do : This will also explain why i can not use things like DMA. I have a main memory ( SRAM ) that will hold program , stack ,heap ,scratchpad, whatever. i dont care what how the compiler stuffs the program in there it can do whatevere it wants. here is a simplified description : I have 2 dual port rams ( DPRAM1 and DPRAM2 ). i also have two interrupt sources and 2 io ports. An external system loads data into dual port RAM 1 at very specific locations. ( hence i need to be able to define absolute addresses in the NIOs program so the program can read this data. ) when the data is loaded this external system tickles interrupt 1. the nios roars to life and starts crunching the data in DPRAM 1. The results are written back to very specific locations of DPRAM 1 ( so that the external hardware can retrieve them). When the NIOS is done it sets io port 1 to '1' signalling the outside world 'i am done , data has been posted , i'm going back to sleep. wake me when you need me ' While the nios was crunching the data in DPRAM using interrupt handler 1 the external hardware was not sitting still either. It has loaded data in DPRAM2. DMA would block bus access and stall the nios ! that would be unacceptable . hence i use dual port ram. problem goes away. maybe my perception of DMA is wrong i don't know. the important bit is the program needs to keep crunching away while i am modifying memory locations that are not in use by that piece of code.) When the external block sees that the nIOs has set PIO1 it understands that the nios is done. the external hardware now tickles interrupt 2. the nios jumps to attention and interrupt handler 2 jumps to life. crunching data in DPRAM2 with an completely different algorithm. in the mena time the external hardware offloads DPRAM1 and stuffs new data in there. In the real application there will be tons of DPRAM's and tons of interrupts. each interrupt i a different algorithm. each dpram has a diffrent memory layout. to minimize all the shuffling typically associated with a program: - i hardcode all variables at specific adresses. that way the outside world know what to put where. - only one interrupt at a time is running. an external hardware scheduler takes care of that (a simple one-hot state machine ) - the processes on the nios do not communicate with each other. i dont even need a printf. There is a dedicated DPRAM to give me messages. the nios just dumps data there and sets an io flag. the external hardware takes care of reading the bytes and sending them , including handshake, to the uart. the nios does not need to spend time on that. I can not divulge what the system is for or why it needs to work this way. It suffices to say that it does work. ( i have it running on an 8051. i just lack clockspeed to do 32 bit arithmetic. something i was hoping to solve with this NIOS thing. since i have the FPGA anyway : dream solution. make the system in SOPC builder. map 8 DPRAMs ( each 256 bytes ) 8 interrupts and 8 pios. write half a page of code. attach exisiting algorithm code. compile and run... tops a few days ... its been weeks..... i know and that is all very beautiful if you are a software developer and will use these modules. I don't. all i need is a number cruncher. i prepare data , it crunches, i load other data meanwhile when its done i offload and it begins with next block. i like to call this 'software assisted hardware' my system doesnt have flash. the program on the nios is loaded dynamically. it can even change a few times a minute. One moment its running this set of algorithms , the next minute it is running a completley different set of algortihms. The easiest way to do this for me is : disconnect the SRAM from the nios : load the program image in the SRAM , reconnect and release the reset pin from the nios. This loading happens through a USB port from the PC. The nios is used as a hardware accelerator. PC drops correct program in the nios memory. PC dumps data in DPRAM , tickles nios. pc drops more data in diffrent dpram. and collects when nios is done. 2 seconds later a diffrent program gets loaded. most programs are small. 1 or 2 kilobyte of code. thank you. that is what i needed to know. can you believe this has taken 2 weeks to find out ? beingnner with nios and the toolchain yes. ive been programming this kind of systems for 20 years. using various toolchains and cpu architectures. most of the time switching meant 1 or 2 days of work. this nios has teken me 2 weeks and i still cant synthesize the core in SOPC builder ( some funny error in 7.2 about 2 clocks that worked fine in 7.1 ) I will use jtag debugger for myself yes. but the final system needs to run without. This thing is going in the field an JTAG will be disabled. can't have anyone snooping around in there while it's running ... not in the main memory. there i don't care. but i need to be able to lock them down in the dpram's. there we go that is what i am attempting. I figured this out in the linker settings. YES ! wooohooo. that is what i needed thank you thank you thank you ! now i can see what that compiler produces. got an ADLs modem at home ? thats me .... and the same mechanism sits in the modem there... throw data in a bucket, have cpu crunch it. in the mean time unload the previous bucket and fill the next. when cpu is done tell him what bucket to crunch next. took half a day to configure an ARM 7 .... one more day to write some verilog and slap it in an ASIC. works like a charm. easy to work with and you can tweak the algorithms. (which would not be possible with a hard implementation. ) I am essentially using the NIOS to emulate hardware. i know i could write it directly in verilog but the algorithms need to be user modifyable. hence :code in c : compile , upload : run. user is not allowed to touch FPGA contents nor get the guts of the system ( there is other stuff in the FPGA that needs to remain hidden... ) --- Quote End --- OK. Cool stuff. FPGA's are perfect for what you are doing. Some ideas to help you deal with interrupt latency: 1. You can eliminate the HAL interrupt vector. BUt I am not going to tell you how because I really don't want to cause you more frustration. It involves re-writing the crt0.s file and I consider that an advanced programming topic alone, let alone dealing with the Nios system. It will take a long while to do. Part of the complexity also involves GCC and linker scripts which can be frustrating as well. Basically you have to undo everything put in to handle embedded processors on an FPGA to create your "custom" implementation. Regardless a _big_ part of the interrupt latency is unavoidable because of the housekeeping needed as required by the Nios ABI (application binary interface - this is documented in the processor reference manual) which you would have to include. If you select interrupt 0 for your interrupt then you will get the fastest vector (THe HAL handler queries the Interrupt status bits one at a time starting with bit 0 which correlates to Interrupt 0). 2. Use the interrupt vector accelerator. This is defined as the Interrupt Vector custom instruction found on the last tab in the Nios Wizard (double click the Nios in SOPC Builder). Just add it in and rebuid your system. The HAL interrupt vector SW will use it if it exists in teh system (just like adding HW multiply and divide and other CPU features will be used by GCC). It will make a big differece in interrupt latency. 3. Use TCM memories to eliminate processor additional latency. TCM's guarentee the fastest processor execution (no wait states reading from memory). See the tutorial here (http://www.altera.com/literature/tt/tt_nios2_tightly_coupled_memory_tutorial.pdf). This can be used in addition to other suggestions here. 4. Use multiple Nios processors. Use one for each DPRAM function you have. See the tutorial found here (http://www.altera.com/literature/tt/tt_nios2_multiprocessor_tutorial.pdf) if you are brave enough to try it. Personally I would consider this the best solution based on what you ahve said so far. Thanks, Rick