Re: Phyr Starter

From: Logan Gunthorpe
Date: Tue Jan 11 2022 - 17:09:29 EST




On 2022-01-11 2:25 p.m., Matthew Wilcox wrote:
> That's reproducing the bad decision of the scatterlist, only with
> a different encoding. You end up with something like:
>
> struct neoscat {
> dma_addr_t dma_addr;
> phys_addr_t phys_addr;
> size_t dma_len;
> size_t phys_len;
> };
>
> and the dma_addr and dma_len are unused by all-but-the-first entry when
> you have a competent IOMMU. We want a different data structure in and
> out, and we may as well keep using the scatterlist for the dma-map-out.

With my P2PDMA patchset, even with a competent IOMMU, we need to support
multiple dma_addr/dma_len pairs (plus the flag bit). This is required to
program IOVAs and multiple bus addresses into a single DMA transactions.

I think using the scatter list for the DMA-out side is not ideal seeing
we don't need the page pointers or multiple length fields and we won't
be able to change the sgl substantially given the massive amount of
existing use cases that won't go away over night.

My hope would be along these lines:

struct phy_range {
phys_addr_t phyr_addr;
u32 phyr_len;
u32 phyr_flags;
};

struct dma_range {
dma_addr_t dmar_addr;
u32 dmar_len;
u32 dmar_flags;
};

A new GUP helper would somehow return a list of phy_range structs and
the new dma_map function would take that list and return a list of
dma_range structs. Each element in the list could represent a segment up
to 4GB, so any range longer than that would need multiple items in the
list. (Alternatively, perhaps the length could be a 64bit value and we
steal some of the top bits for flags or some such). The flags would not
only be needed by some of the use cases mentioned (FOLL_PIN or
DMA_BUS_ADDRESS) but could also support chaining these lists like SGLs
so continuous vmallocs would not be necessary for longer lists.

If there's an [phy|dma]_range_list struct (or some such) which contains
these range structs (per some details of Jason's suggestions) that'd be
fine by me too and would just depend on implementation details.

However, the main problem I see is a chicken and egg problem. The new
dma_map function would need to be implemented by every dma_map provider
or any driver trying to use it would need a messy fallback. Either that,
or we need a wrapper that allocates an appropriately sized SGL to pass
to any dma_map implementation that doesn't support the new structures.

Logan