Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

From: David Rientjes
Date: Fri Dec 13 2019 - 04:33:40 EST

On Thu, 12 Dec 2019, David Rientjes wrote:

> Since all DMA must be unencrypted in this case, what happens if all
> dma_direct_alloc_pages() calls go through the DMA pool in
> kernel/dma/remap.c when force_dma_unencrypted(dev) == true since
> __PAGE_ENC is cleared for these ptes? (Ignoring for a moment that this
> special pool should likely be a separate dma pool.)
> I assume a general depletion of that atomic pool so
> DEFAULT_DMA_COHERENT_POOL_SIZE becomes insufficient. I'm not sure what
> size any DMA pool wired up for this specific purpose would need to be
> sized at, so I assume dynamic resizing is required.
> It shouldn't be *that* difficult to supplement kernel/dma/remap.c with the
> ability to do background expansion of the atomic pool when nearing its
> capacity for this purpose? I imagine that if we just can't allocate pages
> within the DMA mask that it's the only blocker to dynamic expansion and we
> don't oom kill for lowmem. But perhaps vm.lowmem_reserve_ratio is good
> enough protection?
> Beyond that, I'm not sure what sizing would be appropriate if this is to
> be a generic solution in the DMA API for all devices that may require
> unecrypted memory.

Secondly, I'm wondering about how the DMA pool for atomic allocations
compares with lowmem reserve for both ZONE_DMA and ZONE_DMA32. For
allocations where the classzone index is one of these zones, the lowmem
reserve is static, we don't account the amount of lowmem allocated and
adjust this for future watermark checks in the page allocator. We always
guarantee that reserve is free (absent the depletion of the zone due to
GFP_ATOMIC allocations where we fall below the min watermarks).

If all DMA memory needs to have _PAGE_ENC cleared when the guest is SEV
encrypted, I'm wondering if the entire lowmem reserve could be designed as
a pool of lowmem pages rather than a watermark check. If implemented as a
pool of pages in the page allocator itself, and today's reserve is static,
maybe we could get away with a dynamic resizing based on that static
amount? We could offload the handling of this reserve to kswapd such that
when the pool falls below today's reserve amount, we dynamically expand,
do the necessary unencryption in blockable context, and add to the pool.
Bonus is that this provides high-order lowmem reserve if implemented as
per-order freelists rather than the current watermark check that provides
no guarantees for any high-order lowmem.

I don't want to distract from the first set of questions in my previous
email because I need an understanding of that anyway, but I'm hoping
Christoph can guide me on why the above wouldn't be an improvement even
for non encrypted guests.