Forum Discussion

yepp's avatar
yepp
Icon for New Contributor rankNew Contributor
8 months ago

Linux UIO IRQ related periodic CPU usage

Hi,

I have an Intel Arria 10 SoC FPGA system with 5.4.104-lts Linux built with Yocto 3.3.1 and Poky. The installed FPGA image is doing nothing more than making interrupts to an UIO device, 50 times a sec.

The UIO device is defined in the device tree like this:

tx_trig_irq {
    /* /dev/uio4 */
    compatible =  "test_irq", "generic-uio";
    interrupts = < 0 0x16 IRQ_TYPE_EDGE_RISING >;
    interrupt-parent = <&intc>;
}; 

The interrupts arrive in the OS correctly.

Here is the simple program which handles the interrupts, there is no other software running.

#include <iostream>
#include <cmath>
#include <fcntl.h>
#include <unistd.h>

typedef struct {
    int txTrigIrqFd;
} TX_IRQ_HANDLE_S;

bool wait_for_irq (TX_IRQ_HANDLE_S* pHandle) {
    if (!pHandle || pHandle->txTrigIrqFd < 0) {
        std::cout << "ERROR: handle";
        return false;
    }
    
    uint32_t info = 1;
    ssize_t nb = write(pHandle->txTrigIrqFd, &info, sizeof(info));
    if (nb != (ssize_t)sizeof(info)) {
        std::cout << "ERROR: writing";
        return false;
    }

    nb = read(pHandle->txTrigIrqFd, &info, sizeof(info));
    if (nb == (ssize_t)sizeof(info)) {
        return true;
    }

    return false;
}

int main(int argc, char* argv[]) {

    //Init IRQ
    TX_IRQ_HANDLE_S* tx_irq_handle = NULL;
    tx_irq_handle = (TX_IRQ_HANDLE_S*) ( malloc (sizeof (TX_IRQ_HANDLE_S)) );
    if (!tx_irq_handle) {
        std::cout << "irq init failed";
        return 1;
    }
    tx_irq_handle->txTrigIrqFd = open ("/dev/uio4", (O_RDWR));
    if (tx_irq_handle->txTrigIrqFd < 0) {
        free (tx_irq_handle);
        std::cout << "irq init failed";
        return 1;
    }
    if (!tx_irq_handle) {
        std::cout << "irq init failed";
        return 1;
    }
    
    // Do IRQ
    while (true) {
        auto status = wait_for_irq(tx_irq_handle);
    }

    std::cout << "Stopped.\n";
    return 0;
}

It handles the interrupts correctly, I also instrumented the code and see the interrupt handling's timing with Tracy, everything works as intended.

However there is a strange CPU anomaly when starting this simple irq handling program:

  • There are ~5 sec long spikes in CPU usage every minute, periodically
  • When I reduce the number of interrupts coming from the FPGA to 25, the 1 minute period doubles to 2 minutes.
  • When stopping the FPGA to send interrupts, and waiting some time (for eg. 1 minute), then resuming again, the priodicity of the spikes continues from the last state when the FPGA generated interrupts. So when pausing the interrupts in the midde of a spike for some minutes (the cpu usage goes to ~0%), after the resume, the CPU usage continues from the middle of the spike.
  • The characteristics of the CPU usage graph can be tuned by doing work between the waiting for the interrupts, so the spikes can transform into a wave:
// Do IRQ
    while (true) {
        
        //Doing some work
        volatile double x = 0.0001;
        for (int i = 0; i < 20000; ++i) {
            x += std::sin(x) * std::cos(x);
        }

        auto status = wait_for_irq(tx_irq_handle);
    }

Here are some pictures:

Doing no additional work:

Doing some work:Doing little work

Doing more work:

Doing more work

  • I've already instrumented the UIO kernel driver and watched with dmesg how the interrupt handling goes in the kernel, but I found nothing suspicious.
  • This extra, periodical CPU usage is not assigned to any process (watching with top or htop), the cpu usage of my test program is constant, however the total cpu usage shows this periodicity.

I can't imagine what is happening, I ran out of ideas. Do you have any suggestions what could cause this periodic cpu usage?

Thank you!

10 Replies

  • tehjingy_Altera's avatar
    tehjingy_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi


    As we do not receive any response from you on the previous question/reply/answer that we have provided. Please login to ‘https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.



    Regards

    Jingyang, Teh


  • We have tried several experiments, and it seems it is unrelated to the interrupt storm.

    If you have ran out of ideas, my only last resort is not use UIO and write a simple Kernel driver that maps to the IRQ. Just creates a simple handler ISR to dump out the number of interrupts per minute in printf. Every time, you receive an interrupt, you increase a counter. You poll this counter and you do a printf every 1-2minutes so, you don't clutter your log terminal.

    By bypassing UIO, we can definitely check isolate where the problem is?

    If there are no CPU spikes by doing so, then, the UIO is a suspect.

    To me, this would be the simplest way to find out which component causes the problem.

  • tehjingy_Altera's avatar
    tehjingy_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi


    Do you have any update on this issue?

    Did the previous comment help your issue?


    Regards

    Jingyang, Teh


    • yepp's avatar
      yepp
      Icon for New Contributor rankNew Contributor

      Unfortunately, no. I just replied to it.

      Do you also have a suggestion?

  • A few things to try:

    1. If your soft IP is generating the interrupts at X times per second, could you try clearing the interrupt status after every IRQ received at your application.

    Example,

    uint32_t irq_status = *(volatile uint32_t *)(mmio_base + IRQ_STATUS_REG); // Read status
    *(volatile uint32_t *)(mmio_base + IRQ_CLEAR_REG) = irq_status; // Clear it

    There is a possibility that, this create some form of an interrupt storm if you don't. Not clearing it may cause the interrupts pilling up.

    2. if you disable the interrupt at the FPGA, and poll for the events from the FPGA IP from a user application or a kernel device driver by polling a FPGA register for the events, does the CPU utilization issue goes away? This test tells us if the interrupts caused the high CPU utilization.

    3. You could also try to modify the FPGA soft IP to clear the interrupt every time an interrupt is triggered. But this requires more work.

    4. You don't have to acknowledge the interrupt. Remove this "ssize_t nb = write(pHandle->txTrigIrqFd, &info, sizeof(info));". This is only done after a read(). So, this is kind of wrong. A Read() is a blocking operation and once it unblocks, it means an interrupt has occured. You could then, acknowledge it after that if you want. Something like this,

    Example,

    // Function to handle the interrupt
    bool wait_for_irq(TX_IRQ_HANDLE_S* pHandle) {
    if (!pHandle || pHandle->txTrigIrqFd < 0) {
    std::cerr << "ERROR: Invalid handle" << std::endl;
    return false;
    }

    uint32_t info;
    ssize_t nb = read(pHandle->txTrigIrqFd, &info, sizeof(info)); // Read the interrupt info
    if (nb != sizeof(info)) {
    std::cerr << "ERROR: Read failed" << std::endl;
    return false;
    }

    std::cout << "Interrupt received, processing..." << std::endl;

    // After processing the interrupt, acknowledge it by writing back to the UIO device
    nb = write(pHandle->txTrigIrqFd, &info, sizeof(info)); // Acknowledge interrupt
    if (nb != sizeof(info)) {
    std::cerr << "ERROR: Write failed" << std::endl;
    return false;
    }

    return true;
    }

    Thanks

    • yepp's avatar
      yepp
      Icon for New Contributor rankNew Contributor

      Thank you!

      Unfortunately none of them work.

      1. It is UIO irq only device, no memory region assigned to it.

      2. Yes, I have done it before, it works, but I need the more stricter timing with IRQ.

      3. It is an edge-triggered irq.

      4. I've already tried everything (except the solution :D), nothing worked. I've made it clear/not to clear, tried with poll/select , swapped the orders etc.

      • TiensungA_Altera's avatar
        TiensungA_Altera
        Icon for New Contributor rankNew Contributor

        Item 4: Still some question on the user application design:

        Just to double confirm, WRITE() must always be after READ() and only after you are done with any data processing/event handling.

        In the interrupt handler, any interrupt trigger will automatically disable the IRQ. The READ() will get a wakeup and at this point, the IRQ is disabled. The call to write() will re-enable the IRQ.

        If you don't do a WRITE at all, do you see the CPU spikes? In this scenario, the IRQ should be disabled and not entertaining edge interrupt triggers. Check "cat /proc/interrupts" to see if the counts stopped.

        And if you do a WRITE() from your test,

        Check "cat /proc/interrupts" to see if the counts increment as expected or it storms with a huge number.

        bool wait_for_irq (TX_IRQ_HANDLE_S* pHandle) {
            if (!pHandle || pHandle->txTrigIrqFd < 0) {
                std::cout << "ERROR: handle";
                return false;
            }
            
            uint32_t info = 1;
            ssize_t nb = write(pHandle->txTrigIrqFd, &info, sizeof(info));   <- this re-enables IRQ, should always be only when you are ready to for the next event.
            if (nb != (ssize_t)sizeof(info)) {
                std::cout << "ERROR: writing";
                return false;
            }
        
            nb = read(pHandle->txTrigIrqFd, &info, sizeof(info));
            if (nb == (ssize_t)sizeof(info)) {
                return true;
            }
        
            return false;
        }