Re: nvme crash - Re: linux-next: Tree for Aug 13

From: John Garry
Date: Fri Aug 14 2020 - 09:09:26 EST


On 14/08/2020 13:08, Christoph Hellwig wrote:
[148.455065]__sg_alloc_table_from_pages+0xec/0x238
[148.459931]sg_alloc_table_from_pages+0x18/0x28
[148.464541]iommu_dma_alloc+0x474/0x678
[148.468455]dma_alloc_attrs+0xd8/0xf0
[148.472193]nvme_alloc_queue+0x114/0x160 [nvme]
[148.476798]nvme_reset_work+0xb34/0x14b4 [nvme]
[148.481407]process_one_work+0x1e8/0x360
[148.485405]worker_thread+0x44/0x478
[148.489055]kthread+0x150/0x158
[148.492273]ret_from_fork+0x10/0x34
[148.495838] Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
[148.501921] ---[ end trace 89bb2b72d59bf925 ]---

Anything to worry about? I guess not since we're in the merge window, but
mentioning just in case ...
I bisected, and this patch looks to fix it (note the comments below the
'---'):

From 263891a760edc24b901085bf6e5fe2480808f86d Mon Sep 17 00:00:00 2001
From: John Garry<john.garry@xxxxxxxxxx>
Date: Fri, 14 Aug 2020 12:45:18 +0100
Subject: [PATCH] nvme-pci: Use u32 for nvme_dev.q_depth

Recently nvme_dev.q_depth was changed from int to u16 type.

This falls over for the queue depth calculation in nvme_pci_enable(),
where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow, as NVME_CAP_MQES()
gives a 16b number also. That happens for me, and this is the result:
Oh, interesting. Please also switch the module option parsing to
use kstrtou32 and param_set_uint and send this as a formal patch.


I'm doing it now.

BTW, as for the DMA/sg scatterlist code, it so happens in this case that we try the dma alloc for size=0 in nvme_alloc_queue() - I know an allocation for size=0 makes no sense, but couldn't we bit a bit more robust?

Cheers,
John