Re: nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive

From: Mike Snitzer
Date: Thu Mar 27 2008 - 17:12:48 EST


On Thu, Mar 27, 2008 at 9:21 AM, Mike Snitzer <snitzer@xxxxxxxxx> wrote:
> On Thu, Mar 27, 2008 at 8:35 AM, Paul Clements
> <paul.clements@xxxxxxxxxxxx> wrote:
> > Mike Snitzer wrote:
> >
> > > In practice this looks like:
> > >
> > > nbd1: NBD_DISCONNECT
> > > nbd1: Send control failed (result -32)
> > > end_request: I/O error, dev nbd1, sector 0
> > > end_request: I/O error, dev nbd1, sector 8032264
> > > md: super_written gets error=-5, uptodate=0
> > > raid1: Disk failure on nbd1, disabling device.
> > > Operation continuing on 1 devices
> > > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
> > > [<ffffffff88b1e125>] :nbd:sock_xmit+0x9d/0x301
> >
> > > The fact that sock_xmit() in receive mode is unprotected seems to be
> > > the WHY a NULL pointer is possible; but I'm still trying to identify
> > > the HOW.
> >
> > Do you know who is setting the socket NULL? Is it already NULL when you
> > get to this point? Is it the nbd-client -d? Is it the original
> > nbd-client/kernel that does it? Figuring that out would help narrow down
> > the cause.
>
> I believe that NBD_CLEAR_SOCK from 'nbd-client -d' sets it to NULL.
> lo->sock is already NULL on entry to sock_xmit().
>
> So simply checking if the sock_xmit's 'sock' is NULL _should_ avoid
> any possibility of a NULL pointer Oops because sock can't be !NULL
> after the negative check (because of the sock = lo->sock assignment).
> That is, unless I'm missing somewhere in the rest of the kernel (not
> nbd) that would take action to set a socket to NULL?
>
> The attached patch seems reasonable. I'll be testing today to verify
> it fixes the problem.

The nbd_sock_xmit_oops.patch from my previous email does fix this issue.

I verified this by making the race window wider with the attached patch.

The sequence to hit this race (with a patched nbd) is:

# insmod ./nbd_exposed.ko
# nbd-client 192.168.114.152 9011 /dev/nbd1
Negotiation: ..size = 3148739KB
bs=1024, sz=3148739
# dd if=/dev/nbd1 of=/dev/null &
[ NOTE: wait to see "sleeping 30s before recv in sock_xmit" in the syslog ]
# nbd-client -d /dev/nbd1

In the system log you'll see something like:

nbd: registered device at major 43
nbd1:
nbd1: sleeping 30s before recv in sock_xmit
nbd1: NBD_DISCONNECT
nbd1: sleeping 30s after setting sock to NULL in NBD_CLEAR_SOCK
nbd1: sleeping 30s before recv in sock_xmit
Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
[<ffffffff883a917c>] :nbd:sock_xmit+0xf0/0x33e
PGD 141b52067 PUD 144eaf067 PMD 0
Oops: 0000 [1] SMP
...

With the nbd_sock_xmit_oops.patch applied this Oops no longer happens.
I think my original patch's use of EINVAL should be changed to EIO.
Any other comments on the fix are welcome.

Paul, would you like an updated patch that you push upstream? Or
should I just submit the patch and Cc: you?

Mike
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b53fdb0..38397d4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -158,6 +158,14 @@ static int sock_xmit(struct nbd_device *lo, int send, void *buf, int size,
siginitsetinv(&blocked, sigmask(SIGKILL));
sigprocmask(SIG_SETMASK, &blocked, &oldset);

+ if (!send) {
+ /* sleep 30s to expose race with 'nbd-client -d' */
+ printk(KERN_WARNING "%s: sleeping 30s before recv in sock_xmit\n",
+ lo->disk->disk_name);
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(msecs_to_jiffies(30000));
+ }
+
do {
sock->sk->sk_allocation = GFP_NOIO;
iov.iov_base = buf;
@@ -550,7 +558,14 @@ static int nbd_ioctl(struct inode *inode, struct file *file,
case NBD_CLEAR_SOCK:
error = 0;
mutex_lock(&lo->tx_lock);
- lo->sock = NULL;
+ if (lo->sock) {
+ lo->sock = NULL;
+ /* sleep 30s to expose race with sock_xmit */
+ printk(KERN_WARNING "%s: sleeping 30s after setting sock to NULL in NBD_CLEAR_SOCK\n",
+ lo->disk->disk_name);
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(msecs_to_jiffies(30000));
+ }
mutex_unlock(&lo->tx_lock);
file = lo->file;
lo->file = NULL;