Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest

From: Mitchell Augustin
Date: Wed Sep 11 2024 - 18:20:53 EST


Hello,

We recently identified a bug still impacting upstream, triggered
occasionally by one of the kernel selftests (net/pmtu.sh) that
sometimes causes the following behavior:
* One of this tests's namespaced network devices does not get properly
cleaned up when the namespace is destroyed, evidenced by
`unregister_netdevice: waiting for veth_A-R1 to become free. Usage
count = 5` appearing in the dmesg output repeatedly
* Once we start to see the above `unregister_netdevice` message, an
un-cancelable hang will occur on subsequent attempts to run `modprobe
ip6_vti` or `rmmod ip6_vti`

Jacob and I have both investigated various conditions under which this
bug state does / does not occur, which is documented more thoroughly
in the following BugLink:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072501

We expect that veth_A-R1's refcount should be cleaned up by the time
execution of pmtu.sh finishes since the relevant namespaces are
deleted during cleanup of the test suite. We've observed this behavior
on several kernels, at least as old as stable branches like
linux-6.1.y and as recent as v6.11-rc6, so this does not seem like a
new regression. (did not have a chance to test on rc7 yet).

This issue also only occurs very infrequently, and reproducibility is
extremely susceptible to very minor timing variations in the pmtu.sh
test case. (in fact, I was unable to reproduce the bug with the
version of pmtu.sh and lib.sh in v6.11-rc6 - not because the kernel is
unaffected (it is affected, as confirmed by running an older kernel's
pmtu.sh on it), but because v6.11-rc6 introduces some unrelated
functional changes to the tests that cause a slightly longer test
execution time.)
It is also difficult to reproduce the bug on slower CPUs, or even on
faster CPUs where the cpufreq scaling governor is not set to
`performance`.

However, I can easily reproduce the issue on an Nvidia Grace/Hopper
machine (and other platforms with modern CPUs) with the performance
governor set by doing the following:
* Install/boot any affected kernel
* Clone the kernel tree just to get an older version of the test cases
without subtle timing changes that mask the issue (such as
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
* cd tools/testing/selftests/net
* while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done

If running on an appropriately fast CPU, you should start seeing
`unregister_netdevice: waiting for veth_A-R1 to become free. Usage
count = 5` in dmesg at some point. (On Grace/Hopper, it happens in
under a minute, reliably). After that point, attempts to interact with
ip6_vti will hang.

Please let me know if there is any other info I can provide to assist
in debugging this.

Thanks,
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering