On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.
Is there anything about this memory that makes it specifically
intended for peer-to-peer transactions? I assume the device can't
really tell whether a transaction is from a CPU or a peer.
BTW, maybe there could be some kind of guide for device driver writersMakes sense we can look at writing something for the next iteration.
in Documentation/PCI/?
I think it would be clearer and sufficient to simply say that we have
no way to know whether peer-to-peer routing between PCIe Root Ports is
supported (PCIe r4.0, sec 1.3.1).
The fact that you use the PCIe term "switch" suggests that a PCIe
Switch is required, but isn't it sufficient for the peers to be below
the same "PCI bridge", which would include PCIe Root Ports, PCIe
Switch Downstream Ports, and conventional PCI bridges?
The comments at get_upstream_bridge_port() suggest that this isn't
enough, and the peers actually do have to be below the same PCIe
Switch, but I don't know why.
+ addr = devm_memremap_pages(&pdev->dev, pgmap);
+ if (IS_ERR(addr))
Free pgmap here? And in the other error case below? Or maybe this
happens via the devm_* magic? If so, when would that actually happen?
Would pgmap be effectively leaked until the pdev is destroyed?
+ return PTR_ERR(addr);
+
+ error = gen_pool_add_virt(pdev->p2pdma->pool, (uintptr_t)addr,
+ pci_bus_address(pdev, bar) + offset,
+ resource_size(&pgmap->res), dev_to_node(&pdev->dev));
+ if (error)
+ return error;
+
+ error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
+ &pdev->p2pdma->devmap_ref);
+ if (error)
+ return error;
+
+ dev_info(&pdev->dev, "added peer-to-peer DMA memory %pR\n",
+ &pgmap->res);
s/dev_info/pci_info/ (also similar usages below, except for the one or
two cases where you don't have a pci_dev).
This whole thing is confusing to me. Why do you want to reject peers
directly connected to the same root port? Why do you require the same
Switch Upstream Port? You don't exclude conventional PCI, but it
looks like you would require peers to share *two* upstream PCI-to-PCI
bridges? I would think a single shared upstream bridge (conventional,
PCIe Switch Downstream Port, or PCIe Root Port) would be sufficient?
Since "pci_p2pdma_add_client()" includes "pci_" in its name, it seems
sort of weird that callers supply a non-PCI device and then we look up
a PCI device here. I assume you have some reason for this; if you
added a writeup in Documentation/PCI, that would be a good place to
elaborate on that, maybe with a one-line clue here.
+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+ void *ret;
+
+ if (unlikely(!pdev->p2pdma))
Is this a hot path? I'm not sure it's worth cluttering
non-performance paths with likely/unlikely.
+ return NULL;
+
+ if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
+ return NULL;
+
+ ret = (void *)(uintptr_t)gen_pool_alloc(pdev->p2pdma->pool, size);
Why the double cast? Wouldn't "(void *)" be sufficient?
In v4.6-rc1, gen_pool_free() takes "unsigned long addr". I know this
is based on -rc3; is this something that changed between -rc1 and
-rc3?