Re: [PATCH v3 2/3] dma-mapping: replace zone_dma_bits by zone_dma_limit
From: Catalin Marinas
Date: Wed Jul 31 2024 - 13:26:33 EST
On Mon, Jul 29, 2024 at 01:51:25PM +0300, Baruch Siach wrote:
> From: Catalin Marinas <catalin.marinas@xxxxxxx>
>
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
>
> Use direct phys_addr_t limit address for DMA zone limit.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@xxxxxxx>
> Signed-off-by: Baruch Siach <baruch@xxxxxxxxxx>
You should add your Co-developed-by line, the patch evolved a bit from
initial my partial diff.
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9b5ab6818f7f..870fd967c610 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -114,36 +114,28 @@ static void __init arch_reserve_crashkernel(void)
> low_size, high);
> }
>
> -/*
> - * Return the maximum physical address for a zone accessible by the given bits
> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> - * available memory, otherwise cap it at 32-bit.
> - */
> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
> {
> - phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
> - phys_addr_t phys_start = memblock_start_of_DRAM();
> -
> - if (phys_start > U32_MAX)
> - zone_mask = PHYS_ADDR_MAX;
> - else if (phys_start > zone_mask)
> - zone_mask = U32_MAX;
> + /* We have RAM in low 32-bit area, keep DMA zone there */
> + if (memblock_start_of_DRAM() < U32_MAX)
> + zone_limit = min(U32_MAX, zone_limit);
Does this matter anymore? Or is it to keep ZONE_DMA below (or equal to)
the ZONE_DMA32 limit?
Anyway, since this patch is about replacing zone_dma_bits with
zone_dma_limit, we should not introduce functional changes. AFAICT, we
need zone_limit to be set to memblock_end_of_DRAM() when phys_start is
above U32_MAX. You can do the functional change in a subsequent patch
once all the other refactoring has been handled.
>
> - return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> + return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
> }
[...]
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index edbe13d00776..98b7d8015043 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -12,7 +12,7 @@
> #include <linux/mem_encrypt.h>
> #include <linux/swiotlb.h>
>
> -extern unsigned int zone_dma_bits;
> +extern phys_addr_t zone_dma_limit;
>
> /*
> * Record the mapping of CPU physical to DMA addresses for a given region.
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 3b4be4ca3b08..3dbc0b89d6fb 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -20,7 +20,7 @@
> * it for entirely different regions. In that case the arch code needs to
> * override the variable below for dma-direct to work properly.
> */
> -unsigned int zone_dma_bits __ro_after_init = 24;
> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>
> static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> phys_addr_t phys)
> @@ -580,7 +580,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
> * part of the check.
> */
> if (IS_ENABLED(CONFIG_ZONE_DMA))
> - min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
> + min_mask = min_t(u64, min_mask, zone_dma_limit);
> return mask >= phys_to_dma_unencrypted(dev, min_mask);
> }
>
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index d10613eb0f63..410a7b40e496 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -70,7 +70,7 @@ static bool cma_in_zone(gfp_t gfp)
> /* CMA can't cross zone boundaries, see cma_activate_area() */
> end = cma_get_base(cma) + size - 1;
> if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> - return end <= DMA_BIT_MASK(zone_dma_bits);
> + return end <= zone_dma_limit;
> if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> return end <= DMA_BIT_MASK(32);
> return true;
I haven't got to the third patch yet but with this series we can have
zone_dma_limit above DMA_BIT_MASK(32). The above function could return
false for GFP_DMA32 when 'end' is perfectly valid within ZONE_DMA (which
implies safe for GFP_DMA32).
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index df68d29740a0..dfd83e5ee0b3 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -450,7 +450,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> if (!remap)
> io_tlb_default_mem.can_grow = true;
> if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> - io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> + io_tlb_default_mem.phys_limit = zone_dma_limit;
> else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> else
> @@ -629,7 +629,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> }
>
> gfp &= ~GFP_ZONEMASK;
> - if (phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> + if (phys_limit <= zone_dma_limit)
> gfp |= __GFP_DMA;
> else if (phys_limit <= DMA_BIT_MASK(32))
> gfp |= __GFP_DMA32;
I think this has the same issue as dma_direct_optimal_gfp_mask(). If
the requested limit is strictly smaller than DMA_BIT_MASK(32) (but
potentially bigger than zone_dma_limit), we should go for __GFP_DMA
rather than __GFP_DMA32. You should probably fix this in the first patch
as well.
As above, with this series we can end up with zone_dma_limit above
DMA_BIT_MASK(32) and all these checks become confusing. Even the
swiotlb_init_late() above if called with GFP_DMA32 will set a phys_limit
that may not be accessible at all if the DRAM starts above 4GB.
Similarly, cma_in_zone() could return false with GFP_DMA32 in similar
hardware configurations.
So we either introduce a zone_dma32_limit variable and allow a 32-bit
range above the start of DRAM or sanitise these sites to make sure
passing GFP_DMA32 is safe - i.e. assume we only have ZONE_DMA if
zone_dma_limit is above 32-bit. I prefer the latter without introducing
a zone_dma32_limit.
--
Catalin