Re: [PATCH] net: skmsg: pin the delayed-work psock in sk_psock_backlog
From: Jiayuan Chen
Date: Fri May 15 2026 - 05:13:55 EST
On 5/15/26 4:26 PM, Jiayuan Chen wrote:
On 5/15/26 4:12 PM, Cen Zhang wrote:
Dear Jiayuan Chen
Jiayuan Chen <jiayuan.chen@xxxxxxxxx> 于2026年5月15日周五 14:10写道:
Where is the 'last_old_ref_before_put' symbol from? I can't find itHi Jiayuan,
anywhere in the tree.
If you are using LLMs to dig into races like this, please also have them
produce a reproducer, e.g. patch mdelay() into
the relevant windows to widen them, then trigger it from userspace.
Thanks for checking this. You are right: last_old_ref_before_put is
not an in-tree kernel symbol. It was a temporary validation probe
label which recorded the old psock refcount immediately before the
backlog worker's final put, and it should not have appeared in the
commit message as if it were kernel output.
The in-tree path I was trying to describe is:
sk_psock_backlog() starts at net/core/skmsg.c:670.
get path: sk_psock_get(psock->sk), net/core/skmsg.c:692.
put path: sk_psock_put(psock->sk, psock), net/core/skmsg.c:746.
detach clears sk_user_data at net/core/skmsg.c:892.
reattach publishes a replacement psock at net/core/skmsg.c:793.
warning path: REFCOUNT_SUB_UAF at lib/refcount.c:28.
The trigger was based on the in-tree sockmap_redir BPF selftest
under tools/testing/selftests/bpf/prog_tests/.
The one-shot test used AF_UNIX SOCK_STREAM socket pairs, attached
the sk_skb verdict program to the input map, inserted one socket
into the input map and one destination socket into the sockmap at
key 0, then sent one byte through the input peer so the destination
psock backlog worker was queued.
For validation I used a temporary local instrumentation patch in
net/core/skmsg.c. It added a debugfs-controlled gate in
sk_psock_backlog() after the TX-enabled check and before the
existing sk_psock_get(psock->sk) call, plus counters and pr_info()
snapshots in sk_psock_backlog(), sk_psock_init() and
sk_psock_drop(). It also stored the pointer returned by
sk_psock_get(psock->sk) for logging. The worker still used the
existing get path and the existing sk_psock_put(psock->sk, psock)
exit path.
With the worker parked before sk_psock_get(psock->sk), the test
forked: the child deleted the destination sockmap entry, and the
parent retried BPF_NOEXIST update of the same key with the same
destination socket fd until reattach succeeded.
After the delete completed, the test released the old worker. At
that point sk->sk_user_data referred to the replacement psock, while
So, should the fix swap the order of sk->sk_user_data = null and sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED)?
Please also carry a Fixes tag.
[...]
I will send v2 as a new thread after the netdev 24-hour
interval, with the lab probe label removed from the commit text.
If useful, I can also share the small instrumentation/selftest
diff separately to show the exact widened window.
You can just put the kernel patch and userspace program patch in this thread (no need to send a new patch).
Also this patch should be targeted to bpf not net.
--
pw-bot: cr