Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest

From: Eric Dumazet
Date: Fri Sep 13 2024 - 09:51:22 EST


On Fri, Sep 13, 2024 at 3:45 PM Mitchell Augustin
<mitchell.augustin@xxxxxxxxxxxxx> wrote:
>
> Hi Jakub,
> Executing ./pmtu.sh pmtu_ipv6_ipv6_exception manually will only
> trigger the pmtu_ipv6_ipv6_exception sub-case, which only takes a
> second to run on my machines, so you shouldn't need to run the
> entirety of pmtu.sh to trigger the bug. It won't trigger on attempt
> #1, but in my experience, when I do it in that while loop, it will
> trigger in under a minute reliably.
>
> > Somewhat tangentially but if you'd be willing I wouldn't mind if you
> > were to send patches to break this test up upstream, too. It takes
> > 1h23m to run with various debug kernel options enabled. If we split
> > it into multiple smaller tests each running 10min or 20min we can
> > then spawn multiple VMs and get the results faster.
>
> This logical division of tests already exists in pmtu.sh if you pass a
> sub-test name in as the first parameter like above, but if you think
> there would be value in separating them out further or into different
> files not all in pmtu.sh, I would be happy to help with that. Just let
> me know.
>
> Regardless, I will go ahead and work on a new regression test that
> executes just our quick reproducer for this specific bug and will send
> it to this list.
>
> Thanks,
> Mitchell Augustin
>
> On Thu, Sep 12, 2024 at 9:13 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> > On Wed, 11 Sep 2024 17:20:29 -0500 Mitchell Augustin wrote:
> > > We recently identified a bug still impacting upstream, triggered
> > > occasionally by one of the kernel selftests (net/pmtu.sh) that
> > > sometimes causes the following behavior:
> > > * One of this tests's namespaced network devices does not get properly
> > > cleaned up when the namespace is destroyed, evidenced by
> > > `unregister_netdevice: waiting for veth_A-R1 to become free. Usage
> > > count = 5` appearing in the dmesg output repeatedly
> > > * Once we start to see the above `unregister_netdevice` message, an
> > > un-cancelable hang will occur on subsequent attempts to run `modprobe
> > > ip6_vti` or `rmmod ip6_vti`
> >
> > Thanks for the report! We have seen it in our CI as well, it happens
> > maybe once a day. But as you say on x86 is quite hard to reproduce,
> > and nothing obvious stood out as a culprit.
> >
> > > However, I can easily reproduce the issue on an Nvidia Grace/Hopper
> > > machine (and other platforms with modern CPUs) with the performance
> > > governor set by doing the following:
> > > * Install/boot any affected kernel
> > > * Clone the kernel tree just to get an older version of the test cases
> > > without subtle timing changes that mask the issue (such as
> > > https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
> > > * cd tools/testing/selftests/net
> > > * while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done
> >
> > That's exciting! Would you be able to try to cut down the test itself
> > (is quite long and has a ton of sub-cases). Figure out which sub-cases
> > trigger this? And maybe with an even quicker repro we'll bisect or
> > someone will correctly guess the fix?
> >
> > Somewhat tangentially but if you'd be willing I wouldn't mind if you
> > were to send patches to break this test up upstream, too. It takes
> > 1h23m to run with various debug kernel options enabled. If we split
> > it into multiple smaller tests each running 10min or 20min we can
> > then spawn multiple VMs and get the results faster.
>

Note that this issue has been discussed already with Paolo Abeni.

The problem lies in dst_cache infrastructure.