Re: [PATCH] nbd: don't warn when reclassifying a busy socket lock
From: Hillf Danton
Date: Mon Jun 29 2026 - 01:31:44 EST
On Mon, 22 Jun 2026 17:21:53 -0700 Eric Dumazet wrote:
>On Mon, Jun 22, 2026 at 5:07 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>> On Mon, 22 Jun 2026 01:18:10 -0700 Eric Dumazet wrote:
>> >On Sun, Jun 21, 2026 at 6:43 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>> >> On Mon, 22 Jun 2026 05:22:55 +0530 Deepanshu Kartikey wrote:
>> >> > nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
>> >> > held at the point of reclassification. That assertion was copied from
>> >> > nvme-tcp, where the socket is created internally by the kernel
>> >> > (sock_create_kern()) and is never visible to user space, so the lock
>> >> > is guaranteed to be free.
>> >> >
>> >> > NBD is different: the socket is looked up from a user-supplied fd in
>> >> > nbd_get_socket(), and user space retains that fd. A concurrent syscall
>> >> > on the same socket (or softirq processing taking bh_lock_sock() on a
>> >> > connected TCP socket) can legitimately hold the lock at the instant
>> >> > NBD reclassifies it. sock_allow_reclassification() then returns false
>> >> > and the WARN_ON_ONCE() fires, which turns into a crash under
>> >> > panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
>> >> > against socket activity on the same fd, as reported by syzbot.
>> >> >
>> >> Given the syzbot report, if you are right (I suspect) then Eric delivered
>> >> another half-baked croissant, and feel free to cut it off instead to make
>> >> room for correct fix.
>> >
>> > Nobody (including you) caught this.difference between nbd and other
>> > sock_allow_reclassification() callers.
>> >
>> Nope, actually it raises the question -- does the deadlock still remain
>> after your fix without the lock key you added applied?
>
>LOCKDEP might have a false positive, but it will be much much harder to trigger.
>
>I had about 50 syzbot duplicates (that I did not release) before d532cddb6c60
> ("nbd: Reclassify sockets to avoid lockdep circular dependency").
>
>>
>> > What was the "correct fix" you envisioned exactly?
>> >
>> Frankly I had no evidence against your fix a couple days back, but now I
>> see your lock key approach fails to take off. And the correct fix is to
>> erase the incorrect locking order ffa1e7ada456 tries to catch, more
>> difficult than you thought so far.
>
>Which incorrect locking order are you referring to? This is a LOCKDEP
>false positive.
>
For archive purpose, syzbot report [1] where udp was not invovled defies
what is fixed in d532cddb6c60 ("nbd: Reclassify sockets to avoid lockdep
circular dependency") -- "Since the UDP socket and the NBD TCP/TLS socket
are different, this is a false positive."
[1] Subject: [syzbot] [net?] possible deadlock in inet_shutdown (3)
https://lore.kernel.org/lkml/69c37e6a.a70a0220.234938.0045.GAE@xxxxxxxxxx/