RE: [PATCH] dma-mapping: fix dma_addressing_limited if dma_range_map is scanned

From: Justin He
Date: Fri Sep 15 2023 - 05:43:58 EST


Hi Robin,

> -----Original Message-----
> From: Robin Murphy <robin.murphy@xxxxxxx>
> Sent: Tuesday, September 12, 2023 7:57 PM
> To: Justin He <Justin.He@xxxxxxx>; Christoph Hellwig <hch@xxxxxx>; Marek
> Szyprowski <m.szyprowski@xxxxxxxxxxx>; iommu@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] dma-mapping: fix dma_addressing_limited if
> dma_range_map is scanned
>
> On 12/09/2023 9:40 am, Jia He wrote:
> > After scanning the dma_range_map, if it is found that not all of the
> > system RAM ranges are encompassed within it, an incorrect calculation
> > occurs for dma_addressing_limited(), which prevents the nvme device
> > dma mapping in the checking path of phys_to_dma().
>
> Nit: the subject and this description aren't very clear - the key point here is the
> unusual case that the range map covers right up to the top of system RAM, but
> leaves a hole somewhere lower down. There's no issue with the more typical
> case where some RAM exists above the top of the range map, since
> bus_dma_limit will capture that and work as expected.
>
Ok, how about
dma-mapping: fix dma_addressing_limited if dma_range_map can't cover all system RAM

> > E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
> > Method (_DMA, 0, Serialized) // _DMA: Direct Memory Access
> > {
> > Name (RBUF, ResourceTemplate ()
> > {
> > QWordMemory (ResourceConsumer, PosDecode,
> > MinFixed, MaxFixed, Cacheable, ReadWrite,
> > 0x0000000000000000, // Granularity
> > 0x0000000000000000, // Range Minimum
> > 0x00000000FFFFFFFF, // Range Maximum
> > 0x0000000000000000, // Translation Offset
> > 0x0000000100000000, // Length
> > ,, , AddressRangeMemory, TypeStatic)
> > QWordMemory (ResourceConsumer, PosDecode,
> > MinFixed, MaxFixed, Cacheable, ReadWrite,
> > 0x0000000000000000, // Granularity
> > 0x0000006010200000, // Range Minimum
> > 0x000000602FFFFFFF, // Range Maximum
> > 0x0000000000000000, // Translation Offset
> > 0x000000001FE00000, // Length
> > ,, , AddressRangeMemory, TypeStatic)
> > QWordMemory (ResourceConsumer, PosDecode,
> > MinFixed, MaxFixed, Cacheable, ReadWrite,
> > 0x0000000000000000, // Granularity
> > 0x00000060F0000000, // Range Minimum
> > 0x00000060FFFFFFFF, // Range Maximum
> > 0x0000000000000000, // Translation Offset
> > 0x0000000010000000, // Length
> > ,, , AddressRangeMemory, TypeStatic)
> > QWordMemory (ResourceConsumer, PosDecode,
> > MinFixed, MaxFixed, Cacheable, ReadWrite,
> > 0x0000000000000000, // Granularity
> > 0x0000007000000000, // Range Minimum
> > 0x000003FFFFFFFFFF, // Range Maximum
> > 0x0000000000000000, // Translation Offset
> > 0x0000039000000000, // Length
> > ,, , AddressRangeMemory, TypeStatic)
> > })
> >
> > But the System RAM ranges are:
> > cat /proc/iomem |grep -i ram
> > 90000000-91ffffff : System RAM
> > 92900000-fffbffff : System RAM
> > 880000000-fffffffff : System RAM
> > 8800000000-bff5990fff : System RAM
> > bff59d0000-bff5a4ffff : System RAM
> > bff8000000-bfffffffff : System RAM
> > So some RAM ranges are out of dma_range_map.
> >
> > Fixes it by checking whether each of the system RAM resources can be
> > properly encompassed within the dma_range_map.
> >
> > Signed-off-by: Jia He <justin.he@xxxxxxx>
> > ---
> > include/linux/dma-mapping.h | 8 +++++--
> > kernel/dma/mapping.c | 45
> +++++++++++++++++++++++++++++++++++++
> > 2 files changed, 51 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> > index f0ccca16a0ac..d9d1c67c8579 100644
> > --- a/include/linux/dma-mapping.h
> > +++ b/include/linux/dma-mapping.h
> > @@ -144,6 +144,7 @@ bool dma_pci_p2pdma_supported(struct device
> *dev);
> > int dma_set_mask(struct device *dev, u64 mask);
> > int dma_set_coherent_mask(struct device *dev, u64 mask);
> > u64 dma_get_required_mask(struct device *dev);
> > +bool all_ram_in_dma_range_map(struct device *dev);
> > size_t dma_max_mapping_size(struct device *dev);
> > size_t dma_opt_mapping_size(struct device *dev);
> > bool dma_need_sync(struct device *dev, dma_addr_t dma_addr); @@
> > -475,8 +476,11 @@ static inline int dma_coerce_mask_and_coherent(struct
> device *dev, u64 mask)
> > */
> > static inline bool dma_addressing_limited(struct device *dev)
> > {
> > - return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
> > - dma_get_required_mask(dev);
> > + if (min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
> > + dma_get_required_mask(dev))
>
> Nit: indentation
>
> > + return true;
> > +
> > + return !all_ram_in_dma_range_map(dev);
> > }
> >
> > static inline unsigned int dma_get_max_seg_size(struct device *dev)
> > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index
> > e323ca48f7f2..ab407deb81b8 100644
> > --- a/kernel/dma/mapping.c
> > +++ b/kernel/dma/mapping.c
> > @@ -14,6 +14,7 @@
> > #include <linux/of_device.h>
> > #include <linux/slab.h>
> > #include <linux/vmalloc.h>
> > +#include <linux/dma-direct.h>
>
> Nit: please keep the includes sorted alphabetically, however I do wonder
> whether this whole thing shouldn't belong in dma-direct anyway.
The dma-direct.h is for including struct bus_dma_region only.
Maybe I need to move struct bus_dma_region to a proper header file?

>
> > #include "debug.h"
> > #include "direct.h"
> >
> > @@ -819,6 +820,50 @@ size_t dma_opt_mapping_size(struct device *dev)
> > }
> > EXPORT_SYMBOL_GPL(dma_opt_mapping_size);
> >
> > +/*
> > + * To check whether all ram resource ranges are mapped in dma range
> > +map
> > + * Returns 0 when continuous check is needed
> > + * Returns 1 if there is some ram range can't be mapped to
> > +dma_range_map */ static int check_ram_in_range_map(unsigned long
> > +start_pfn,
> > + unsigned long nr_pages, void *data) {
> > + phys_addr_t end_paddr = (start_pfn + nr_pages) << PAGE_SHIFT;
>
> This could still wrap to 0 on 32-bit. I think the robust thing to do is either spray
> some extra -1s and +1s around to make all the "end" values inclusive limits, or
> maybe just do everything in units of pages rather than bytes (i.e. use
> PFN_DOWN() on the bus_dma_region entry values).
>
Ok, let me try the latter one, which looks easier to me.

> > + phys_addr_t start_paddr = start_pfn << PAGE_SHIFT;
> > + struct device *dev = (struct device *)data;
> > + struct bus_dma_region *region = NULL;
> > + const struct bus_dma_region *m;
> > +
> > + while (start_paddr < end_paddr) {
> > + // find region containing start_paddr
>
> Nit: inconsistent comment style (although it's not a particularly valuable
> comment anyway, IMO the loop itself is clear enough).
>
Sure, let me remove it.


--
Cheers,
Justin (Jia He)