Re: NULL deref around blkmq in v4.0-rc1ârc7

From: Jens Axboe
Date: Thu Apr 09 2015 - 17:43:00 EST


On 04/09/2015 03:37 PM, Linus Torvalds wrote:
On Thu, Apr 9, 2015 at 2:25 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:

Not sure why it isn't all zeroed, definitely the saner thing to do at init
time.

So practically speaking, it might well often be zeroed just because
the BIOS may have initialized memory that way (and big multi-page
allocations have probably not gotten re-used).

And if this is mpt, we recently ran into some list corruption issues due to
a bug in the driver. It hit on reboot, but it was scan related, so could be
a boot issue as well.

So one of the earlier emails had this:

Copyright (c) 1999-2008 LSI Corporation
Fusion MPT SAS Host driver 3.04.20
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1068 A0: Capabilities={Initiator}
scsi host0: ioc0: LSISAS1068 A0, FwRev=00000000h, Ports=8, MaxQ=256, IRQ=22
mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 1, phy 1,
sas_addr 0x1060504030201a0
scsi 0:0:0:0: Direct-Access VBOX HARDDISK 1.0 PQ: 0 ANSI: 5
scsi 0:0:0:0: Attached scsi generic sg0 type 0
mptbase: ioc1: Initiating bringup
ioc1: LSISAS1068 A0: Capabilities={Initiator}
scsi host1: ioc1: LSISAS1068 A0, FwRev=00000000h, Ports=8, MaxQ=256, IRQ=17
mptsas: ioc1: attaching ssp device: fw_channel 0, fw_id 0, phy 0,
sas_addr 0x60504030201a0
scsi 1:0:0:0: Direct-Access VBOX HARDDISK 1.0 PQ: 0 ANSI: 5
scsi 1:0:0:0: Attached scsi generic sg1 type 0

and I'm assuming that that is the backing storage.

mpt is a maze of roughly duplicate, crazy drivers. The bug in question impacted mpt2sas and mpt3sas, and this looks like the mpt fusion driver. So it's probably not that.

And yes, memory corruption sounds like a more likely cause than
anything else. I don't like how the request data wasn't fully
initialized, but the cmd->sense_buffer pointer itself *should* have
been initialized by the ->init_request() call.

The block request state should be sane, we clear what we need, and at-alloc init will ensure that state is safe across request reuse. But it really should just be cleared unconditionally at allocation time.

And ->init_request() should take care of the SCSI command init. It does look like it's relying on zeroes already, so either adding the memset() or just adding the __GFP_ZERO would be prudent.

So I don't actually expect my patch to really make any difference,
although I do think that code should be looked at.

Jan, is it always clearing in a page size? That seems odd, especially if we're considering random gunk in memory.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/