Re: Kernel 5.9-rc Regression: Boot failure with nvme

From: David Rientjes
Date: Sat Aug 29 2020 - 17:35:43 EST


On Sat, 29 Aug 2020, Christoph Hellwig wrote:

> > Just adding Christoph to the participants list, since at a guess it's
> > due to his changes whether they came from the nvme side or the dma
> > side..
> >
> > Christoph?
>
> This kinda looks like the sqsize regression we had in earlier 5.9-rc,
> but that should have been fixed in -rc2 with
>
> 7442ddcedc344b6fa073692f165dffdd1889e780
> Author: John Garry <john.garry@xxxxxxxxxx>
> Date: Fri Aug 14 23:34:25 2020 +0800
>
> nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth
>
> Daniel, can you double check that you don't have that commit?
>

Looks like Daniel has confirmed that this indeed does fix his issue --
great!

Christoph, re the plan to backport the atomic DMA pool support to 5.4 LTS
for the purposes of fixing the AMD SEV allocation issues, I've composed
the following list:

e860c299ac0d dma-remap: separate DMA atomic pools from direct remap code
c84dc6e68a1d dma-pool: add additional coherent pools to map to gfp mask
54adadf9b085 dma-pool: dynamically expanding atomic pools
76a19940bd62 dma-direct: atomic allocations must come from atomic coherent pools
2edc5bb3c5cc dma-pool: add pool sizes to debugfs
1d659236fb43 dma-pool: scale the default DMA coherent pool size with memory capacity
3ee06a6d532f dma-pool: fix too large DMA pools on medium memory size systems
dbed452a078d dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL
** 633d5fce78a6 dma-direct: always align allocation size in dma_direct_alloc_pages()
** 96a539fa3bb7 dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails
** 56fccf21d196 dma-direct: check return value when encrypting or decrypting memory
** 1a2b3357e860 dma-direct: add missing set_memory_decrypted() for coherent mapping
d07ae4c48690 dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR
71cdec4fab76 dma-mapping: warn when coherent pool is depleted
567f6a6eba0c dma-direct: provide function to check physical memory area validity
23e469be6239 dma-pool: get rid of dma_in_atomic_pool()
48b6703858dd dma-pool: introduce dma_guess_pool()
81e9d894e03f dma-pool: make sure atomic pool suits device
d9765e41d8e9 dma-pool: do not allocate pool memory from CMA
9420139f516d dma-pool: fix coherent pool allocations for IOMMU mappings
d7e673ec2c8e dma-pool: Only allocate from CMA when in same memory zone

[ The commits prefixed with ** are not absolutely required for atomic DMA
but rather fix other issues with SEV in the DMA layer that I found
along the way. They are likely deserving of their own stable
backports, but added them here because it's probably best to backport
in order to minimize conflicts. We'll simply make a note of that in
the cover letter for the stable backport series. ]

Do you know of any others to add? NVMe specific fixes, perhaps John
Garry's fix above, Intel IOMMU fixes maybe?