Forum Discussion

yepp's avatar
yepp
Icon for New Contributor rankNew Contributor
8 months ago

Linux UIO IRQ related periodic CPU usage

Hi,

I have an Intel Arria 10 SoC FPGA system with 5.4.104-lts Linux built with Yocto 3.3.1 and Poky. The installed FPGA image is doing nothing more than making interrupts to an UIO device, 50 times a sec.

The UIO device is defined in the device tree like this:

tx_trig_irq {
    /* /dev/uio4 */
    compatible =  "test_irq", "generic-uio";
    interrupts = < 0 0x16 IRQ_TYPE_EDGE_RISING >;
    interrupt-parent = <&intc>;
}; 

The interrupts arrive in the OS correctly.

Here is the simple program which handles the interrupts, there is no other software running.

#include <iostream>
#include <cmath>
#include <fcntl.h>
#include <unistd.h>

typedef struct {
    int txTrigIrqFd;
} TX_IRQ_HANDLE_S;

bool wait_for_irq (TX_IRQ_HANDLE_S* pHandle) {
    if (!pHandle || pHandle->txTrigIrqFd < 0) {
        std::cout << "ERROR: handle";
        return false;
    }
    
    uint32_t info = 1;
    ssize_t nb = write(pHandle->txTrigIrqFd, &info, sizeof(info));
    if (nb != (ssize_t)sizeof(info)) {
        std::cout << "ERROR: writing";
        return false;
    }

    nb = read(pHandle->txTrigIrqFd, &info, sizeof(info));
    if (nb == (ssize_t)sizeof(info)) {
        return true;
    }

    return false;
}

int main(int argc, char* argv[]) {

    //Init IRQ
    TX_IRQ_HANDLE_S* tx_irq_handle = NULL;
    tx_irq_handle = (TX_IRQ_HANDLE_S*) ( malloc (sizeof (TX_IRQ_HANDLE_S)) );
    if (!tx_irq_handle) {
        std::cout << "irq init failed";
        return 1;
    }
    tx_irq_handle->txTrigIrqFd = open ("/dev/uio4", (O_RDWR));
    if (tx_irq_handle->txTrigIrqFd < 0) {
        free (tx_irq_handle);
        std::cout << "irq init failed";
        return 1;
    }
    if (!tx_irq_handle) {
        std::cout << "irq init failed";
        return 1;
    }
    
    // Do IRQ
    while (true) {
        auto status = wait_for_irq(tx_irq_handle);
    }

    std::cout << "Stopped.\n";
    return 0;
}

It handles the interrupts correctly, I also instrumented the code and see the interrupt handling's timing with Tracy, everything works as intended.

However there is a strange CPU anomaly when starting this simple irq handling program:

  • There are ~5 sec long spikes in CPU usage every minute, periodically
  • When I reduce the number of interrupts coming from the FPGA to 25, the 1 minute period doubles to 2 minutes.
  • When stopping the FPGA to send interrupts, and waiting some time (for eg. 1 minute), then resuming again, the priodicity of the spikes continues from the last state when the FPGA generated interrupts. So when pausing the interrupts in the midde of a spike for some minutes (the cpu usage goes to ~0%), after the resume, the CPU usage continues from the middle of the spike.
  • The characteristics of the CPU usage graph can be tuned by doing work between the waiting for the interrupts, so the spikes can transform into a wave:
// Do IRQ
    while (true) {
        
        //Doing some work
        volatile double x = 0.0001;
        for (int i = 0; i < 20000; ++i) {
            x += std::sin(x) * std::cos(x);
        }

        auto status = wait_for_irq(tx_irq_handle);
    }

Here are some pictures:

Doing no additional work:

Doing some work:Doing little work

Doing more work:

Doing more work

  • I've already instrumented the UIO kernel driver and watched with dmesg how the interrupt handling goes in the kernel, but I found nothing suspicious.
  • This extra, periodical CPU usage is not assigned to any process (watching with top or htop), the cpu usage of my test program is constant, however the total cpu usage shows this periodicity.

I can't imagine what is happening, I ran out of ideas. Do you have any suggestions what could cause this periodic cpu usage?

Thank you!

10 Replies

  • A few things to try:

    1. If your soft IP is generating the interrupts at X times per second, could you try clearing the interrupt status after every IRQ received at your application.

    Example,

    uint32_t irq_status = *(volatile uint32_t *)(mmio_base + IRQ_STATUS_REG); // Read status
    *(volatile uint32_t *)(mmio_base + IRQ_CLEAR_REG) = irq_status; // Clear it

    There is a possibility that, this create some form of an interrupt storm if you don't. Not clearing it may cause the interrupts pilling up.

    2. if you disable the interrupt at the FPGA, and poll for the events from the FPGA IP from a user application or a kernel device driver by polling a FPGA register for the events, does the CPU utilization issue goes away? This test tells us if the interrupts caused the high CPU utilization.

    3. You could also try to modify the FPGA soft IP to clear the interrupt every time an interrupt is triggered. But this requires more work.

    4. You don't have to acknowledge the interrupt. Remove this "ssize_t nb = write(pHandle->txTrigIrqFd, &info, sizeof(info));". This is only done after a read(). So, this is kind of wrong. A Read() is a blocking operation and once it unblocks, it means an interrupt has occured. You could then, acknowledge it after that if you want. Something like this,

    Example,

    // Function to handle the interrupt
    bool wait_for_irq(TX_IRQ_HANDLE_S* pHandle) {
    if (!pHandle || pHandle->txTrigIrqFd < 0) {
    std::cerr << "ERROR: Invalid handle" << std::endl;
    return false;
    }

    uint32_t info;
    ssize_t nb = read(pHandle->txTrigIrqFd, &info, sizeof(info)); // Read the interrupt info
    if (nb != sizeof(info)) {
    std::cerr << "ERROR: Read failed" << std::endl;
    return false;
    }

    std::cout << "Interrupt received, processing..." << std::endl;

    // After processing the interrupt, acknowledge it by writing back to the UIO device
    nb = write(pHandle->txTrigIrqFd, &info, sizeof(info)); // Acknowledge interrupt
    if (nb != sizeof(info)) {
    std::cerr << "ERROR: Write failed" << std::endl;
    return false;
    }

    return true;
    }

    Thanks

    • yepp's avatar
      yepp
      Icon for New Contributor rankNew Contributor

      Thank you!

      Unfortunately none of them work.

      1. It is UIO irq only device, no memory region assigned to it.

      2. Yes, I have done it before, it works, but I need the more stricter timing with IRQ.

      3. It is an edge-triggered irq.

      4. I've already tried everything (except the solution :D), nothing worked. I've made it clear/not to clear, tried with poll/select , swapped the orders etc.

      • TiensungA_Altera's avatar
        TiensungA_Altera
        Icon for New Contributor rankNew Contributor

        2. Yes, I have done it before, it works, but I need the more stricter timing with IRQ.

        >The only clue I have is that, it seems the interrupt handling may be the culprit. Even though it is edge triggered, apparently the interrupt did not get cleared properly and results in an interrupt storm. Can you check if the Linux UIO kernel driver is receiving the interrupt storm and hence causing the CPU spike? Probably using "cat /proc/interrupts" to see the interrupt statistics.

        You mentioned the FPGA "The installed FPGA image is doing nothing more than making interrupts to an UIO device, 50 times a sec." Is this part of the Arria10 GHRD? Can you share us more information on the design?

        DT you shared:

        tx_trig_irq {
            /* /dev/uio4 */
            compatible =  "test_irq", "generic-uio";
            interrupts = < 0 0x16 IRQ_TYPE_EDGE_RISING >;
            interrupt-parent = <&intc>;
        }; 

        You are using 0x16 = 22 , so, the IRQ = 22 + 32 = 55 (F2S_FPGA_IRQ4). This seems right.

        The FPGQ IRQs for Arria10 begins 51 with https://www.intel.com/content/www/us/en/docs/programmable/683711/21-2/gic-interrupt-map-for-the-arria-10-soc-hps.html

        51

        FPGA

        F2S_FPGA_IRQ0

        -

        Level or Edge

        52

        FPGA

        F2S_FPGA_IRQ1

        -

        Level or Edge

        53

        FPGA

        F2S_FPGA_IRQ2

        -

        Level or Edge

        54

        FPGA

        F2S_FPGA_IRQ3

        -

        Level or Edge

        55

        FPGA

        F2S_FPGA_IRQ4

        -

        Level or Edge

        56

        FPGA

        F2S_FPGA_IRQ5

        -

        Level or Edge

        57

        FPGA

        F2S_FPGA_IRQ6

        -

        Level or Edge

  • tehjingy_Altera's avatar
    tehjingy_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi


    Do you have any update on this issue?

    Did the previous comment help your issue?


    Regards

    Jingyang, Teh


    • yepp's avatar
      yepp
      Icon for New Contributor rankNew Contributor

      Unfortunately, no. I just replied to it.

      Do you also have a suggestion?

  • We have tried several experiments, and it seems it is unrelated to the interrupt storm.

    If you have ran out of ideas, my only last resort is not use UIO and write a simple Kernel driver that maps to the IRQ. Just creates a simple handler ISR to dump out the number of interrupts per minute in printf. Every time, you receive an interrupt, you increase a counter. You poll this counter and you do a printf every 1-2minutes so, you don't clutter your log terminal.

    By bypassing UIO, we can definitely check isolate where the problem is?

    If there are no CPU spikes by doing so, then, the UIO is a suspect.

    To me, this would be the simplest way to find out which component causes the problem.

  • tehjingy_Altera's avatar
    tehjingy_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi


    As we do not receive any response from you on the previous question/reply/answer that we have provided. Please login to ‘https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.



    Regards

    Jingyang, Teh