Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

From: Thomas Gleixner
Date: Mon Aug 05 2024 - 04:59:25 EST


On Sun, Aug 04 2024 at 20:28, Guenter Roeck wrote:
> On 8/4/24 11:36, Guenter Roeck wrote:
>>> Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>>      genirq: Set IRQF_COND_ONESHOT in request_irq()
>>>
>>
>> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
>>
>> [    0.000000] =============================================================================
>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> [    0.000000] -----------------------------------------------------------------------------

Do you have a full boot log? It's unclear to me at which point of the boot
process this happens. Is this before or after the secondary CPUs have
been brought up?

>> This never stops until the emulation aborts.

Do you have a recipe how to reproduce?

>> Reverting this patch fixes the problem for me.
>>
>> I noticed a similar problem in the mainline kernel but it is either spurious there
>> or the problem has been fixed.
>>
>
> As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me.
> I guess that suggests some kind of race condition.
>
>
> @@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
> struct irq_desc *desc;
> int retval;
>
> + udelay(1);
> +
> if (irq == IRQ_NOTCONNECTED)
> return -ENOTCONN;

That all makes absolutely no sense to me.

IRQF_COND_ONESHOT has only an effect on shared interrupts, when the
interrupt was already requested with IRQF_ONESHOT.

If this is really a race then the following must be true:

1) no delay

CPU0 CPU1
request_irq(IRQF_ONESHOT)
request_irq(IRQF_COND_ONESHOT)

2) delay

CPU0 CPU1
request_irq(IRQF_COND_ONESHOT)
request_irq(IRQF_ONESHOT)

In this case the request on CPU 0 fails with -EBUSY ...

Confused

tglx