Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
21 years ago

Target reboots during driver debug

I am debugging a modified lan91C111 driver for a custom board using the uCos telnet example (modified). Quartus4.1, NiosII CPU32 'tiny' IDE 1.0.

The driver has been modified to use a 16bit native bus. This question is more about the debugger and debugging with uCos.

My target reboots shortly after my first rx interrupt, either under the debugger, or just running, flashed or sof'ed. I have turned on all the debug flags in the lwip code and sprinkled in printfs so I can somewhat trace the execution. I also have a partial hw probe of the 'C111.

What happens is the debug console (jtag uart) stops printing, the driver code does a few more things ('C111 accesses and debug printfs that never reach the console) and then I get a reboot (no external reset). I now have a different image burned into flash that gets loaded and prints to a different serial port, which makes the reboot more obvious. The debugger just thinks that the original thread in the original image is still running. Suspending the thread will give me some info such as sp & pc, but I think most of the info is bogus since a totally new image is running. Synchronisation between image and src is lost isn't it?

I believe the reboot is a sw problem in the driver (wild pointer, memory leak). My board does run the other uCos examples. The 'C111 is just a slave with a dedicated interface, so even if it is misconfigured, it should not be able to cause a reboot.

Without a functioning console or complete hw trace it is hard to get a good picture of what really happens to cause this reboot. I can set breakpoints, but it appears the pauses introduced by these alter the sequence of execution enough that the code executes longer before rebooting.

So what are all the tricks, tips (and features I don't know about) to get more visibility into what is happening?

Anything in the quartus patch or IDE 1.01 for me here?

5 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    One of the docs says limited on-chip address trace should be available to me without an additional license. If so, has anyone used this?

    Other debuggers have features like stack unwinding, stack limits checking. Any of this available to me?

    I see in the default link/init, registers are defaulted to 0xdeadbeef and the data areas are randomly sprinkled with these markers. Is there any reasoning or description telling me where these were placed?

    What does the debug level setting -g3 do for me. I did not see any difference.

    thanks
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hello tns1,

    Trace is available without an additional license from FS2 for 16 frames of trace data. Several instructions are packed into the bits of each trace frame, resulting in a few dozen instructions worth of trace from these 16 trace buffers. You also get 2 data breakpoints and 2 instruction breakpoints without an additional license. You can use these breakpoints with an event of turning on or off trace, thereby targeting those few dozen instructions worth of trace information right where you want them.

    I think your guess of a wild pointer is probably accurate. The most common problem I have seen with ethernet drivers and the lwIP stack is a stack overflow. Such an overflow may be causing a jump to zero, and if your reset address is located at zero, this could make a symptom of stack overflow look like a reset of the board. This is common because any task which uses sockets must be created by the lwIP sys_thread_new instead of OSTaskCreate, and sys_thread_new only allocates 2048 bytes per task. Any call to an RTL function, including printf, consumes almost 1000 bytes of stack space, which is half of your task stack right there. One customer who I helped had identical symptoms, but had all of his printf statements wrapped with a# define for DEBUGGING. After turning off hi DEBUGGING macro, and therefore eliminating all of his calls to printf from with any tasks, incluing those using sockets, his ethernet driver worked fine.

    You can modify the stack space created for each sys_thread_new call by modifying this function's definition. This source code is located in the Nios II Device Drivers project, under the lwip component.

    You can use FS2 data breakpoints to watch for stack overflow. Step into the sys_thread_new function calls, which calls OSTaskCreate, in order to find the address which is passed to the OSTaskCreate calls as the stack address parameter. Then set your FS2 data breakpoints to watch for any writes to the end of that stack, and you can catch your stack overflow.

    By the way, the FS2 interface has been improved for Nios version 1.01, and is well worth downloading, installing, and recreating your projects under the new version (remember to uninstall Nios 1.0 first).
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi,

    I had a similar problem. My device (1S80) did sometimes not boot from internal memory. When we (an ALTERA FAE and me) were debugging the stuff we saw, that the NIOS did jump back to the reset vector when the JTAG UART driver registered it's interrupt.

    Finally, ALTERA found out that there is a problem within the linker script.

    We had to add just a line to fix the aligment of the code. See the example below. After this it was working fine!

    According to ALTERA there will be a fix in the next release.

    I hope this helps.

    .rodata :

    {

    . = ALIGN(32 / 8);

    *(.rodata .rodata.* .gnu.linkonce.r.*)

    *(.rodata1)

    -- Added line!!!!

    . = ALIGN(32 / 8);

    } > onchip_rom

    .rwdata : AT (LOADADDR(.rodata) + SIZEOF (.rodata))

    {

    …..
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Great info.

    I wish I had read these posts a few days ago. I did find the bug, but not by any special deduction or technique. Earlier, my rx isr was not getting called. I had placed a printf there as a debug aid. It was this printf that led to the reboot/reset. The actual sequence of events is unclear, and tracing thru uCos, the isr handler and the jtag uart code is all quite convoluted. I can reason that it was a stack/buffer overflow, since a full print buffer alone should just cause everything to lock without a reset. I'll try out the trace when I get a chance.