Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

From: Don Dutile
Date: Tue May 08 2018 - 18:21:27 EST


On 05/08/2018 06:03 PM, Alex Williamson wrote:
On Tue, 8 May 2018 21:42:27 +0000
"Stephen Bates" <sbates@xxxxxxxxxxxx> wrote:

Hi Alex

But it would be a much easier proposal to disable ACS when the
IOMMU is not enabled, ACS has no real purpose in that case.

I guess one issue I have with this is that it disables IOMMU groups
for all Root Ports and not just the one(s) we wish to do p2pdma on.

But as I understand this series, we're not really targeting specific
sets of devices either. It's more of a shotgun approach that we
disable ACS on downstream switch ports and hope that we get the right
set of devices, but with the indecisiveness that we might later
white-list select root ports to further increase the blast radius.

The IOMMU and P2P are already not exclusive, we can bounce off
the IOMMU or make use of ATS as we've previously discussed. We were
previously talking about a build time config option that you
didn't expect distros to use, so I don't think intervention for the
user to disable the IOMMU if it's enabled by default is a serious
concern either.

ATS definitely makes things more interesting for the cases where the
EPs support it. However I don't really have a handle on how common
ATS support is going to be in the kinds of devices we have been
focused on (NVMe SSDs and RDMA NICs mostly).

What you're trying to do is enabled direct peer-to-peer for
endpoints which do not support ATS when the IOMMU is enabled, which
is not something that necessarily makes sense to me.

As above the advantage of leaving the IOMMU on is that it allows for
both p2pdma PCI domains and IOMMU groupings PCI domains in the same
system. It is just that these domains will be separate to each other.

That argument makes sense if we had the ability to select specific sets
of devices, but that's not the case here, right? With the shotgun
approach, we're clearly favoring one at the expense of the other and
it's not clear why we don't simple force the needle all the way in that
direction such that the results are at least predictable.

So that leaves avoiding bounce buffers as the remaining IOMMU
feature

I agree with you here that the devices we will want to use for p2p
will probably not require a bounce buffer and will support 64 bit DMA
addressing.

I'm still not seeing why it's terribly undesirable to require
devices to support ATS if they want to do direct P2P with an IOMMU
enabled.

I think the one reason is for the use-case above. Allowing IOMMU
groupings on one domain and p2pdma on another domain....

If IOMMU grouping implies device assignment (because nobody else uses
it to the same extent as device assignment) then the build-time option
falls to pieces, we need a single kernel that can do both. I think we
need to get more clever about allowing the user to specify exactly at
which points in the topology they want to disable isolation. Thanks,

Alex

+1/ack

RDMA VFs lend themselves to NVMEoF w/device-assignment.... need a way to
put NVME 'resources' into an assignable/manageable object for 'IOMMU-grouping',
which is really a 'DMA security domain' and less an 'IOMMU grouping domain'.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html