Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest

From: Mitchell Augustin
Date: Fri Sep 13 2024 - 09:45:53 EST


Hi Jakub,
Executing ./pmtu.sh pmtu_ipv6_ipv6_exception manually will only
trigger the pmtu_ipv6_ipv6_exception sub-case, which only takes a
second to run on my machines, so you shouldn't need to run the
entirety of pmtu.sh to trigger the bug. It won't trigger on attempt
#1, but in my experience, when I do it in that while loop, it will
trigger in under a minute reliably.

> Somewhat tangentially but if you'd be willing I wouldn't mind if you
> were to send patches to break this test up upstream, too. It takes
> 1h23m to run with various debug kernel options enabled. If we split
> it into multiple smaller tests each running 10min or 20min we can
> then spawn multiple VMs and get the results faster.

This logical division of tests already exists in pmtu.sh if you pass a
sub-test name in as the first parameter like above, but if you think
there would be value in separating them out further or into different
files not all in pmtu.sh, I would be happy to help with that. Just let
me know.

Regardless, I will go ahead and work on a new regression test that
executes just our quick reproducer for this specific bug and will send
it to this list.

Thanks,
Mitchell Augustin

On Thu, Sep 12, 2024 at 9:13 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Wed, 11 Sep 2024 17:20:29 -0500 Mitchell Augustin wrote:
> > We recently identified a bug still impacting upstream, triggered
> > occasionally by one of the kernel selftests (net/pmtu.sh) that
> > sometimes causes the following behavior:
> > * One of this tests's namespaced network devices does not get properly
> > cleaned up when the namespace is destroyed, evidenced by
> > `unregister_netdevice: waiting for veth_A-R1 to become free. Usage
> > count = 5` appearing in the dmesg output repeatedly
> > * Once we start to see the above `unregister_netdevice` message, an
> > un-cancelable hang will occur on subsequent attempts to run `modprobe
> > ip6_vti` or `rmmod ip6_vti`
>
> Thanks for the report! We have seen it in our CI as well, it happens
> maybe once a day. But as you say on x86 is quite hard to reproduce,
> and nothing obvious stood out as a culprit.
>
> > However, I can easily reproduce the issue on an Nvidia Grace/Hopper
> > machine (and other platforms with modern CPUs) with the performance
> > governor set by doing the following:
> > * Install/boot any affected kernel
> > * Clone the kernel tree just to get an older version of the test cases
> > without subtle timing changes that mask the issue (such as
> > https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
> > * cd tools/testing/selftests/net
> > * while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done
>
> That's exciting! Would you be able to try to cut down the test itself
> (is quite long and has a ton of sub-cases). Figure out which sub-cases
> trigger this? And maybe with an even quicker repro we'll bisect or
> someone will correctly guess the fix?
>
> Somewhat tangentially but if you'd be willing I wouldn't mind if you
> were to send patches to break this test up upstream, too. It takes
> 1h23m to run with various debug kernel options enabled. If we split
> it into multiple smaller tests each running 10min or 20min we can
> then spawn multiple VMs and get the results faster.



--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering