Re: [PATCH RFC net-next v2 7/7] net: skbuff: always try to recycle PP pages directly when in softirq
From: Alexander Lobakin
Date: Thu Jul 20 2023 - 13:50:18 EST
From: Jakub Kicinski <kuba@xxxxxxxxxx>
Date: Thu, 20 Jul 2023 10:12:31 -0700
> On Thu, 20 Jul 2023 18:46:02 +0200 Alexander Lobakin wrote:
>> From: Jakub Kicinski <kuba@xxxxxxxxxx>
>> Date: Wed, 19 Jul 2023 13:51:50 -0700
>>
>>> On Wed, 19 Jul 2023 18:34:46 +0200 Alexander Lobakin wrote:
>> [...]
>>>>
>>>> If we're on the same CPU where the NAPI would run and in the same
>>>> context, i.e. softirq, in which the NAPI would run, what is the problem?
>>>> If there really is a good one, I can handle it here.
>>>
>>> #define SOFTIRQ_BITS 8
>>> #define SOFTIRQ_MASK (__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
>>> # define softirq_count() (preempt_count() & SOFTIRQ_MASK)
>>> #define in_softirq() (softirq_count())
>>
>> I do remember those, don't worry :)
>>
>>> I don't know what else to add beyond that and the earlier explanation.
>>
>> My question was "how can two things race on one CPU in one context if it
>> implies they won't ever happen simultaneously", but maybe my zero
>> knowledge of netcons hides something from me.
>
> One of them is in hardirq.
If I got your message correctly, that means softirq_count() can return
`true` even if we're in hardirq context, but there are some softirqs
pending? I.e. if I call local_irq_save() inside NAPI poll loop,
in_softirq() will still return `true`? (I'll check it myself in a bit,
but why not ask).
Isn't checking for `interrupt_context_level() == 1` more appropriate
then? Page Pool core code also uses in_softirq(), as well as a hellaton
of other networking-related places.
>
>>> AFAIK pages as allocated by page pool do not benefit from the usual
>>> KASAN / KMSAN checkers, so if we were to double-recycle a page once
>>> a day because of a netcons race - it's going to be a month long debug
>>> for those of us using Linux in production.
>>
>> if (!test_bit(&napi->state, NPSVC))
>
> if you have to the right check is !in_hardirq()
>
>> ? It would mean we're not netpolling.
>> Otherwise, if this still is not enough, I'do go back to my v1 approach
>> with having a NAPI flag, which would tell for sure we're good to go. I
>> got confused by your "wouldn't just checking for softirq be enough"! T.T
>> Joking :D
>
> I guess the problem I'm concerned about can already happen.
> I'll send a lockdep annotation shortly.
Interesten.
Thanks,
Olek