Re: Assistance Needed for Kernel mode driver Soft Lockup Issue

From: Muni Sekhar
Date: Sun Oct 20 2024 - 11:02:19 EST


On Sun, Oct 20, 2024 at 6:18 PM Denis Kirjanov <kirjanov@xxxxxxxxx> wrote:
>
>
>
> суббота, 19 октября 2024 г. пользователь Muni Sekhar <munisekharrms@xxxxxxxxx> написал:
>>
>> Dear Linux Kernel Developers,
>>
>> I am encountering a soft lockup issue in my system related to the
>> continuous while loop in the empty_rx_fifo() function. Below is the
>> relevant code:
>>
>>
>> #include <linux/io.h> // For readw()
>>
>> #define FIFO_STATUS 0x0014
>> #define FIFO_MAN_READ 0x0015
>> #define RX_FIFO_EMPTY 0x01 // Assuming RX_FIFO_EMPTY is defined as 0x01
>>
>> static inline uint16_t read16_shifted(void __iomem *addr, u32 offset)
>> {
>> void __iomem *target_addr = addr + (offset << 1); // Left shift
>> the offset by 1 and add to the base address
>> uint16_t value = readw(target_addr); // Read the 16-bit value from
>> the calculated address
>> return value;
>> }
>>
>> void empty_rx_fifo(void __iomem *addr)
>> {
>> while (!(read16_shifted(addr, FIFO_STATUS) & RX_FIFO_EMPTY)) {
>> read16_shifted(addr, FIFO_MAN_READ); // Keep reading from the
>> FIFO until it's empty
>> }
>> }
>>
>> Explanation:
>> Function Name: read16_shifted — The function reads a 16-bit value from
>> an offset address with a left shift operation.
>> Operation: It shifts the offset left by 1 (offset << 1), adds it to
>> the base address, and reads the value from the new address.
>> The empty_rx_fifo function is designed to clear out the RX FIFO, but
>> I've encountered soft lockup issues. Specifically, the system logs
>> repeated soft lockup messages in the kernel log, with a time gap of
>> roughly 28 seconds between them (as per the kernel log timestamps).
>> Here's an example log:
>>
>> watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
>>
>> In all cases, the RIP points to:
>> RIP: 0010:read16_shifted+0x11/0x20
>>
>>
>> Analysis:
>> The soft lockup seems to be caused by the continuous while loop in the
>> empty_rx_fifo() function. The RX FIFO takes a considerable amount of
>> time to empty, sometimes up to 1000 seconds. As a result, from the
>> first occurrence of the soft lockup trace, the log repeats
>> approximately every 28 seconds for the entire 1000 seconds duration.
>> After 1000 seconds, the system resumes normal operation.
>>
>> Questions:
>> 1. How should I best handle this kind of issue? Even if the hardware
>> takes time, I would like advice on the best approach to prevent these
>> lockups.
>
>
> I guess that you can switch on interrupt model or run a thread to check the status there (here I mean check RX empty and release cpu)
Thanks for your response.

Switching to an interrupt model should resolve it, but unfortunately,
the hardware I am using does not support interrupts for this
functionality.
Would adding udelay() in the while loop after every few iterations
help avoid CPU hogging, allowing other processes to take control of
the CPU?

>
>> 2. Do soft lockup issues auto-recover like this? Is this something I
>> should consider serious, or can it be ignored?
>
>
> The kernel tells you that your cpu resource is stuck instead of doing something useful
>
>>
>> I would appreciate any guidance on how to resolve or mitigate this problem.
>>
>>
>> --
>> Thanks,
>> Sekhar
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies@xxxxxxxxxxxxxxxxx
>> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
>
> --
> Regards / Mit besten Grüßen,
> Denis
>


--
Thanks,
Sekhar