Altera_Forum
Honored Contributor
15 years agoInterupt infinite loop issue
Hello,
I did a search but frankly I don't have a lot of time digging through threads. My initial searches came up with nothing. I'm curious to see if anyone has seen an issue when the entire Linux System locks up? The surface cause is the external_interrupt routine in entry.S will enter an infinite loop if ipending is 0x0 when the interrupt handler is entered. This is caused by the bit test loop looking for a pending interrupt request and exiting the loop _only_ when a bit is set. If no bits are set it just happily keeps looping never to exit since ipending is never read again during this loop This can be caused by faulty hardware (component asserts it's interrupt line, the Nios branches to the exception vector and then the component removes it before the Nios can read the ipending register thus reading all 0's). In my specific case hardware is not the cause and has been verified in STP. I added code to entry.S to skip the bit test loop and return whenever ipending is zero (since we know it will simply go into an infinite loop - this _is_ a bug). The external_interrupt routine should have this kind of change added to it permanently. Preferably there would be a reporting mechanism to report this condition as a spurious interrupt to the Linux kernel and then somehow logged as a system error. I don't know how to do that however. So with further debugging doing a trigger on the ipending == 0 branch (my modified entry.S) in Signal Tap we were able to trace the instructions back and discovered the root cause is the alt_sgdma_isr routine. Specifically it appears that the exit code of alt_sgdma_isr is clearing both the TX SGDMA and RX SGDMA when processing a TX interrupt. Why does this routine clear the RX Interrupt bit when processing a TX interrupt? That is a red flag to me right away. keep in mind the TX SGDMA and RX SGDMA are two completely different components. They operate independently. There really should be two separate isr routines, IMO, but it is what it is (possibly due to a Linux kernel restriction?). What happens is on a very infrequent basis (not so infrequent when my customer has hundreds of systems using this code. This error shows up quite frequently between all these systems) is whenever the RX SGDMA interrupt line to the Nios is asserted exactly one assembler instruction before the RX clearing, the RX SGDMA deasserts it's interrupt line as a result of the clear instruction. But it is too late. The Nios has seen the signal and several clocks later branches to service the interrupt. But the interrupt signal is already gone and external_interrupt then would enter the infinite loop (without my changes to entry.S). So there are two issues here. 1. Clearing the RX interrupt when processing a TX interrupt in alt_sgdma_isr and 2. Why are interrupts enabled in alt_sgdma_isr anyways. In my tests the exception vector is branched to before the end of alt_sgdma_isr is reached (interrupts the isr). Maybe this is ok but certainly clearing the RX interrupt while processing a TX interrupt is not correct. I propose a change to alt_sgdma_isr of the following code in altera_tse.c: //reset irqif(irq == tse_priv->rx_fifo_interrupt){
tse_priv->rx_sgdma_dev->control |= alt_sgdma_control_clear_interrupt_msk;
}else if(irq == tse_priv->tx_fifo_interrupt){
tse_priv->tx_sgdma_dev->control |= alt_sgdma_control_clear_interrupt_msk;
} Does this make sense? Is there a better way to handle this? So why post all this here? Because I am not a kernel expert. I would like some of the Linux Kernel experts here to look at my post and let me know if any of the issues I have seen here have already been addressed or if not help me get these changes into the Linux for Nios distribution since I imagine other Linux for Nios users will eventually run into this issue when they try to put their systems into production. If there is a better way to address this issue then I am all ears and would love to hear any recommendations. If necessary please contact me direct by email or PM since I cannot mention the name of the customer here in my post. Thanks for your help. Rick Hill TSFAE, Embedded Systems, Altera, Inc.