Re: [PATCH net] net/ipv6: Fix the RT cache flush via sysctl using a previous delay

From: Paolo Abeni
Date: Tue Jun 04 2024 - 04:31:14 EST


On Fri, 2024-05-31 at 10:53 +0200, Petr Pavlu wrote:
> [Added back netdev@xxxxxxxxxxxxxxx and linux-kernel@xxxxxxxxxxxxxxx
> which seem to be dropped by accident.]
>
> On 5/30/24 17:59, Kuifeng Lee wrote:
> > On Wed, May 29, 2024 at 6:53 AM Petr Pavlu <petr.pavlu@xxxxxxxx> wrote:
> > >
> > > The net.ipv6.route.flush system parameter takes a value which specifies
> > > a delay used during the flush operation for aging exception routes. The
> > > written value is however not used in the currently requested flush and
> > > instead utilized only in the next one.
> > >
> > > A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
> > > of net->ipv6.sysctl.flush_delay into a local delay variable and then
> > > calls proc_dointvec() which actually updates the sysctl based on the
> > > provided input.
> >
> > If the problem we are trying to fix is using the old value, should we move
> > the line reading the value to a place after updating it instead of a
> > local copy of
> > the whole ctl_table?
>
> Just moving the read of net->ipv6.sysctl.flush_delay after the
> proc_dointvec() call was actually my initial implementation. I then
> opted for the proposed version because it looked useful to me to save
> memory used to store net->ipv6.sysctl.flush_delay.

Note that due to alignment, the struct netns_sysctl_ipv6 size is not
going to change on 64 bits build.

And if the layout would change, that could have subtle performance side
effects (moving later fields in netns_sysctl_ipv6 in different
cachelines) that we want to avoid for a net patch.

> Another minor aspect is that these sysctl writes are not serialized. Two
> invocations of ipv6_sysctl_rtcache_flush() could in theory occur at the
> same time. It can then happen that they both first execute
> proc_dointvec(). One of them ends up slower and thus its value gets
> stored in net->ipv6.sysctl.flush_delay. Both runs then return to
> ipv6_sysctl_rtcache_flush(), read the stored value and execute
> fib6_run_gc(). It means one of them calls this function with a value
> different that it was actually given on input. By having a purely local
> variable, each write is independent and fib6_run_gc() is executed with
> the right input delay.
>
> The cost of making a copy of ctl_table is a few instructions and this
> isn't on any hot path. The same pattern is used, for example, in
> net/ipv6/addrconf.c, function addrconf_sysctl_forward().
>
> So overall, the proposed version looked marginally better to me than
> just moving the read of net->ipv6.sysctl.flush_delay later in
> ipv6_sysctl_rtcache_flush().

All in all the increased complexity vs the simple solution does not
look worth to me.

Please revert to the initial/simpler implementation for this fix,
thanks!

Paolo