Re: [PATCH net-next v2] netlink: clean up failed initial dump-start state
From: Jakub Kicinski
Date: Thu Apr 23 2026 - 21:50:33 EST
On Thu, 23 Apr 2026 17:28:27 -0400 Michael Bommarito wrote:
> __netlink_dump_start() installs cb->skb, takes the module reference,
> and sets cb_running before calling netlink_dump(sk, true). If that
> first call returns via errout_skb the callback state is left behind:
> cb_running stays set, module_put() and consume_skb(cb->skb) are
> deferred until recvmsg() drives the dump back through the success
> path, or netlink_release() on close runs the catch-all cleanup. On
> sustained alloc failure neither fires.
>
> Factor the teardown into netlink_dump_cleanup(nlk, drop) shared by
> the dump success path, the lock_taken=true errout_skb path, and
> netlink_release(). The @drop flag preserves the existing split:
> consume_skb() on normal completion, kfree_skb() on abort.
>
> Validation on a UML guest: an unprivileged task opens NETLINK_ROUTE,
> preloads sk_rmem_alloc, then issues RTM_GETLINK | NLM_F_DUMP. Stock
> kernel leaves cb_running stuck at 1 until recvmsg() or close()
> drives it. Patched kernel clears cb_running immediately on the
> lock_taken=true failure; the recvmsg continuation path is unchanged.
>
> At scale: 3500 wedged sockets in a 256M guest show about 3.8-3.9
> MiB of extra unreclaimable slab (~1.1 KiB/sock) on stock vs zero on
> patched. RLIMIT_NOFILE bounds the test before OOM, so this is a
> local availability cleanup rather than an exhaustion primitive.
>
> Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Looks like this changes existing behavior :(
The tools/testing/selftests/net/netlink-dumps.c test used to always
see a NLMSG_DONE now it doesn't. Maybe this change is more risk
than reward?
process nit: please don't post new versions in reply to previous