Re: [PATCH] nbd: don't warn when reclassifying a busy socket lock

From: Eric Dumazet

Date: Tue Jun 30 2026 - 03:20:03 EST


On Sun, Jun 28, 2026 at 10:29 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> On Mon, 22 Jun 2026 17:21:53 -0700 Eric Dumazet wrote:
> >On Mon, Jun 22, 2026 at 5:07 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
> >> On Mon, 22 Jun 2026 01:18:10 -0700 Eric Dumazet wrote:
> >> >On Sun, Jun 21, 2026 at 6:43 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
> >> >> On Mon, 22 Jun 2026 05:22:55 +0530 Deepanshu Kartikey wrote:
> >> >> > nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
> >> >> > held at the point of reclassification. That assertion was copied from
> >> >> > nvme-tcp, where the socket is created internally by the kernel
> >> >> > (sock_create_kern()) and is never visible to user space, so the lock
> >> >> > is guaranteed to be free.
> >> >> >
> >> >> > NBD is different: the socket is looked up from a user-supplied fd in
> >> >> > nbd_get_socket(), and user space retains that fd. A concurrent syscall
> >> >> > on the same socket (or softirq processing taking bh_lock_sock() on a
> >> >> > connected TCP socket) can legitimately hold the lock at the instant
> >> >> > NBD reclassifies it. sock_allow_reclassification() then returns false
> >> >> > and the WARN_ON_ONCE() fires, which turns into a crash under
> >> >> > panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
> >> >> > against socket activity on the same fd, as reported by syzbot.
> >> >> >
> >> >> Given the syzbot report, if you are right (I suspect) then Eric delivered
> >> >> another half-baked croissant, and feel free to cut it off instead to make
> >> >> room for correct fix.
> >> >
> >> > Nobody (including you) caught this.difference between nbd and other
> >> > sock_allow_reclassification() callers.
> >> >
> >> Nope, actually it raises the question -- does the deadlock still remain
> >> after your fix without the lock key you added applied?
> >
> >LOCKDEP might have a false positive, but it will be much much harder to trigger.
> >
> >I had about 50 syzbot duplicates (that I did not release) before d532cddb6c60
> > ("nbd: Reclassify sockets to avoid lockdep circular dependency").
> >
> >>
> >> > What was the "correct fix" you envisioned exactly?
> >> >
> >> Frankly I had no evidence against your fix a couple days back, but now I
> >> see your lock key approach fails to take off. And the correct fix is to
> >> erase the incorrect locking order ffa1e7ada456 tries to catch, more
> >> difficult than you thought so far.
> >
> >Which incorrect locking order are you referring to? This is a LOCKDEP
> >false positive.
> >
> For archive purpose, syzbot report [1] where udp was not invovled defies
> what is fixed in d532cddb6c60 ("nbd: Reclassify sockets to avoid lockdep
> circular dependency") -- "Since the UDP socket and the NBD TCP/TLS socket
> are different, this is a false positive."
>
>
> [1] Subject: [syzbot] [net?] possible deadlock in inet_shutdown (3)
> https://lore.kernel.org/lkml/69c37e6a.a70a0220.234938.0045.GAE@xxxxxxxxxx/


Why don't you send a patch if you think one is needed?