Re: [PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

From: Robin Murphy
Date: Thu Jun 30 2022 - 10:57:11 EST


On 2022-06-29 23:41, Logan Gunthorpe wrote:


On 2022-06-29 13:15, Robin Murphy wrote:
On 2022-06-29 16:57, Logan Gunthorpe wrote:



On 2022-06-29 06:07, Robin Murphy wrote:
On 2022-06-15 17:12, Logan Gunthorpe wrote:
When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

A P2PDMA page may have three possible outcomes when being mapped:
    1) If the data path between the two devices doesn't go through
       the root port, then it should be mapped with a PCI bus address
    2) If the data path goes through the host bridge, it should be
mapped
       normally with an IOMMU IOVA.
    3) It is not possible for the two devices to communicate and thus
       the mapping operation should fail (and it will return
-EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
---
   drivers/iommu/dma-iommu.c | 68
+++++++++++++++++++++++++++++++++++----
   1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f90251572a5d..b01ca0c6a7ab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -21,6 +21,7 @@
   #include <linux/iova.h>
   #include <linux/irq.h>
   #include <linux/list_sort.h>
+#include <linux/memremap.h>
   #include <linux/mm.h>
   #include <linux/mutex.h>
   #include <linux/pci.h>
@@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev,
struct scatterlist *sg, int nents,
           sg_dma_address(s) = DMA_MAPPING_ERROR;
           sg_dma_len(s) = 0;
   +        if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {

Logically, should we not be able to use sg_is_dma_bus_address() here? I
think it should be feasible, and simpler, to prepare the p2p segments
up-front, such that at this point all we need to do is restore the
original length (if even that, see below).

Per my previous email, no, because sg_is_dma_bus_address() is not set
yet and not meant to tell you something about the page. That flag will
be set below by pci_p2pdma_map_bus_segment() and then checkd in
iommu_dma_unmap_sg() to determine if the dma_address in the segment
needs to be unmapped.

I know it's not set yet as-is; I'm suggesting things should be
restructured so that it *would be*. In the logical design of this code,
the DMA addresses are effectively determined in iommu_dma_map_sg(), and
__finalise_sg() merely converts them from a relative to an absolute form
(along with undoing the other trickery). Thus the call to
pci_p2pdma_map_bus_segment() absolutely belongs in the main
iommu_map_sg() loop.

I don't see how that can work: __finalise_sg() does more than convert
them from relative to absolute, it also figures out which SG entry will
contain which dma_address segment. Which segment a P2PDMA address needs
to be programmed into depends on the how 'cur' is calculated which in
turn depends on things like seg_mask and max_len. This calculation is
not done in iommu_dma_map_sg() so I don't see how there's any hope of
assigning the bus address for the P2P segments in that function.

If there's a way to restructure things so that's possible that I'm not
seeing, I'm open to it but it's certainly not immediately obvious.

Huh? It's still virtually the same thing; iommu_dma_map_sg() calls pci_p2pdma_map_bus_segment(s) and sets s->length to 0 if PCI_P2PDMA_MAP_BUS_ADDR, then __finalise_sg() can use sg_is_dma_bus_address(s) in place of is_pci_p2pdma_page(sg_page(s)), and just propagate the DMA address and original length from s to cur.

Here you've written a patch which looks to correctly interrupt any ongoing concatenation state and convey some data from the given input segment to the appropriate output segment, so I'm baffled by why you'd think you couldn't do what you've already done.

Thanks,
Robin.