Re: [PATCH net] net/ipv6: Fix the RT cache flush via sysctl using a previous delay

From: Petr Pavlu
Date: Fri Jun 07 2024 - 04:33:55 EST


On 6/4/24 10:30, Paolo Abeni wrote:
> On Fri, 2024-05-31 at 10:53 +0200, Petr Pavlu wrote:
>> [Added back netdev@xxxxxxxxxxxxxxx and linux-kernel@xxxxxxxxxxxxxxx
>> which seem to be dropped by accident.]
>>
>> On 5/30/24 17:59, Kuifeng Lee wrote:
>>> On Wed, May 29, 2024 at 6:53 AM Petr Pavlu <petr.pavlu@xxxxxxxx> wrote:
>>>>
>>>> The net.ipv6.route.flush system parameter takes a value which specifies
>>>> a delay used during the flush operation for aging exception routes. The
>>>> written value is however not used in the currently requested flush and
>>>> instead utilized only in the next one.
>>>>
>>>> A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
>>>> of net->ipv6.sysctl.flush_delay into a local delay variable and then
>>>> calls proc_dointvec() which actually updates the sysctl based on the
>>>> provided input.
>>>
>>> If the problem we are trying to fix is using the old value, should we move
>>> the line reading the value to a place after updating it instead of a
>>> local copy of
>>> the whole ctl_table?
>>
>> Just moving the read of net->ipv6.sysctl.flush_delay after the
>> proc_dointvec() call was actually my initial implementation. I then
>> opted for the proposed version because it looked useful to me to save
>> memory used to store net->ipv6.sysctl.flush_delay.
>
> Note that due to alignment, the struct netns_sysctl_ipv6 size is not
> going to change on 64 bits build.
>
> And if the layout would change, that could have subtle performance side
> effects (moving later fields in netns_sysctl_ipv6 in different
> cachelines) that we want to avoid for a net patch.
>
>> Another minor aspect is that these sysctl writes are not serialized. Two
>> invocations of ipv6_sysctl_rtcache_flush() could in theory occur at the
>> same time. It can then happen that they both first execute
>> proc_dointvec(). One of them ends up slower and thus its value gets
>> stored in net->ipv6.sysctl.flush_delay. Both runs then return to
>> ipv6_sysctl_rtcache_flush(), read the stored value and execute
>> fib6_run_gc(). It means one of them calls this function with a value
>> different that it was actually given on input. By having a purely local
>> variable, each write is independent and fib6_run_gc() is executed with
>> the right input delay.
>>
>> The cost of making a copy of ctl_table is a few instructions and this
>> isn't on any hot path. The same pattern is used, for example, in
>> net/ipv6/addrconf.c, function addrconf_sysctl_forward().
>>
>> So overall, the proposed version looked marginally better to me than
>> just moving the read of net->ipv6.sysctl.flush_delay later in
>> ipv6_sysctl_rtcache_flush().
>
> All in all the increased complexity vs the simple solution does not
> look worth to me.
>
> Please revert to the initial/simpler implementation for this fix,
> thanks!

Fair enough, I'll post v2 with the initial/simpler version.

Thanks,
Petr