Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory
From: Bjorn Helgaas
Date: Thu Mar 01 2018 - 18:00:44 EST
On Thu, Mar 01, 2018 at 11:55:51AM -0700, Logan Gunthorpe wrote:
> Hi Bjorn,
>
> Thanks for the review. I'll correct all the nits for the next version.
>
> On 01/03/18 10:37 AM, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
> > > Some PCI devices may have memory mapped in a BAR space that's
> > > intended for use in Peer-to-Peer transactions. In order to enable
> > > such transactions the memory must be registered with ZONE_DEVICE pages
> > > so it can be used by DMA interfaces in existing drivers.
>
> > Is there anything about this memory that makes it specifically
> > intended for peer-to-peer transactions? I assume the device can't
> > really tell whether a transaction is from a CPU or a peer.
>
> No there's nothing special about the memory and it can still be accessed by
> the CPU. This is just the intended purpose. You could use this PCI memory as
> regular DMA buffers or regular memory but I'm not sure why you would. It
> would probably be pretty bad performance-wise.
>
>
> > BTW, maybe there could be some kind of guide for device driver writers
> > in Documentation/PCI/?
> Makes sense we can look at writing something for the next iteration.
>
> > I think it would be clearer and sufficient to simply say that we have
> > no way to know whether peer-to-peer routing between PCIe Root Ports is
> > supported (PCIe r4.0, sec 1.3.1).
>
> Fair enough.
>
> > The fact that you use the PCIe term "switch" suggests that a PCIe
> > Switch is required, but isn't it sufficient for the peers to be below
> > the same "PCI bridge", which would include PCIe Root Ports, PCIe
> > Switch Downstream Ports, and conventional PCI bridges?
> > The comments at get_upstream_bridge_port() suggest that this isn't
> > enough, and the peers actually do have to be below the same PCIe
> > Switch, but I don't know why.
>
> I do mean Switch as we do need to keep the traffic off the root complex.
> Seeing, as stated above, we don't know if it actually support it. (While we
> can be certain any PCI switch does). So we specifically want to exclude PCIe
> Root ports and I'm not sure about the support of PCI bridges but I can't
> imagine anyone wanting to do P2P around them so I'd rather be safe than
> sorry and exclude them.
I don't think this is correct. A Root Port defines a hierarchy domain
(I'm looking at PCIe r4.0, sec 1.3.1). The capability to route
peer-to-peer transactions *between* hierarchy domains is optional. I
think this means a Root Complex is not required to route transactions
from one Root Port to another Root Port.
This doesn't say anything about routing between two different devices
below a Root Port. Those would be in the same hierarchy domain and
should follow the conventional PCI routing rules. Of course, since a
Root Port has one link that leads to one device, they would probably
be different functions of a single multi-function device, so I don't
know how practical it would be to test this.
> > This whole thing is confusing to me. Why do you want to reject peers
> > directly connected to the same root port? Why do you require the same
> > Switch Upstream Port? You don't exclude conventional PCI, but it
> > looks like you would require peers to share *two* upstream PCI-to-PCI
> > bridges? I would think a single shared upstream bridge (conventional,
> > PCIe Switch Downstream Port, or PCIe Root Port) would be sufficient?
>
> Hmm, yes, this may just be laziness on my part. Finding the shared upstream
> bridge is a bit more tricky than just showing that they are on the same
> switch. So as coded, a fabric of switches with peers on different legs of
> the fabric are not supported. But yes, maybe they just need to be two
> devices with a single shared upstream bridge that is not the root port.
> Again, we need to reject the root port because we can't know if the root
> complex can support P2P traffic.
This sounds like the same issue as above, so we just need to resolve
that.
> > Since "pci_p2pdma_add_client()" includes "pci_" in its name, it seems
> > sort of weird that callers supply a non-PCI device and then we look up
> > a PCI device here. I assume you have some reason for this; if you
> > added a writeup in Documentation/PCI, that would be a good place to
> > elaborate on that, maybe with a one-line clue here.
>
> Well yes, but this is much more convenient for callers which don't need to
> care if the device they are attempting to add (which in the NVMe target
> case, could be a random block device) is a pci device or not. Especially
> seeing find_parent_pci_dev() is non-trivial.
OK. I accept that it might be convenient, but I still think it leads
to a weird API. Maybe that's OK; I don't know enough about the
scenario in the caller to do anything more than say "hmm, strange".
> > > +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> > > +{
> > > + void *ret;
> > > +
> > > + if (unlikely(!pdev->p2pdma))
> >
> > Is this a hot path? I'm not sure it's worth cluttering
> > non-performance paths with likely/unlikely.
>
> I'd say it can be pretty hot given that we need to allocate and free buffers
> at multiple GB/s for the NVMe target case. I don't exactly have benchmarks
> or anything to show this though...
OK, I'll take your word for that. I'm fine with this as long as it is
performance-sensitive.
Bjorn