Re: [PATCH net v2] ipmr: Fix access to mfc_cache_list without lock held

From: Stefan Wiehler
Date: Thu Nov 21 2024 - 09:54:50 EST


> On 11/15/24 17:55, Paolo Abeni wrote:
>> On 11/15/24 17:07, Stefan Wiehler wrote:
>>>> On Fri, 15 Nov 2024 01:16:27 -0800 Breno Leitao wrote:
>>>>> This one seems to be discussed in the following thread already.
>>>>>
>>>>> https://lore.kernel.org/all/20241017174109.85717-1-stefan.wiehler@xxxxxxxxx/
>>>>
>>>> That's why it rung a bell..
>>>> Stefan, are you planning to continue with the series?
>>>
>>> Yes, sorry for the delay, went on vacation and was busy with other tasks, but
>>> next week I plan to continue (i.e. refactor using refcount_t).
>>
>> I forgot about that series and spent a little time investigating the
>> scenario.
>>
>> I think we don't need a refcount: the tables are freed only at netns
>> cleanup time, so the netns refcount is enough to guarantee that the
>> tables are not deleted when escaping the RCU section.
>>
>> Some debug assertions could help clarify, document and make the schema
>> more robust to later change.
>>
>> Side note, I think we need to drop the RCU lock moved by:
>>
>> https://lore.kernel.org/all/20241017174109.85717-2-stefan.wiehler@xxxxxxxxx/
>>
>> as the seqfile core can call blocking functions - alloc(GFP_KERNEL) -
>> between ->start() and ->stop().
>>
>> The issue is pre-existent to that patch, and even to the patch
>> introducing the original RCU() - the old read_lock() created an illegal
>> atomic scope - but I think we should address it while touching this code.
>
> @Stefan: are you ok if I go ahead with this work, or do you prefer
> finish it yourself?

Please go ahead, I have neither the expertise in the net subsystem nor the time
to ramp-up (since this is just a side finding for us right now) to proceed with
your proposal. I'll follow the discussion though and hope to learn something
along the way!

Kind regards,

Stefan