Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest

From: Jakub Kicinski
Date: Thu Sep 12 2024 - 22:13:17 EST


On Wed, 11 Sep 2024 17:20:29 -0500 Mitchell Augustin wrote:
> We recently identified a bug still impacting upstream, triggered
> occasionally by one of the kernel selftests (net/pmtu.sh) that
> sometimes causes the following behavior:
> * One of this tests's namespaced network devices does not get properly
> cleaned up when the namespace is destroyed, evidenced by
> `unregister_netdevice: waiting for veth_A-R1 to become free. Usage
> count = 5` appearing in the dmesg output repeatedly
> * Once we start to see the above `unregister_netdevice` message, an
> un-cancelable hang will occur on subsequent attempts to run `modprobe
> ip6_vti` or `rmmod ip6_vti`

Thanks for the report! We have seen it in our CI as well, it happens
maybe once a day. But as you say on x86 is quite hard to reproduce,
and nothing obvious stood out as a culprit.

> However, I can easily reproduce the issue on an Nvidia Grace/Hopper
> machine (and other platforms with modern CPUs) with the performance
> governor set by doing the following:
> * Install/boot any affected kernel
> * Clone the kernel tree just to get an older version of the test cases
> without subtle timing changes that mask the issue (such as
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
> * cd tools/testing/selftests/net
> * while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done

That's exciting! Would you be able to try to cut down the test itself
(is quite long and has a ton of sub-cases). Figure out which sub-cases
trigger this? And maybe with an even quicker repro we'll bisect or
someone will correctly guess the fix?

Somewhat tangentially but if you'd be willing I wouldn't mind if you
were to send patches to break this test up upstream, too. It takes
1h23m to run with various debug kernel options enabled. If we split
it into multiple smaller tests each running 10min or 20min we can
then spawn multiple VMs and get the results faster.