Re: [PATCH V3] PCI: Extend ACS configurability

From: Bjorn Helgaas
Date: Wed Jun 12 2024 - 17:29:26 EST


[+cc Alex since VFIO entered the conversation; thread at
https://lore.kernel.org/r/20240523063528.199908-1-vidyas@xxxxxxxxxx]

On Mon, Jun 10, 2024 at 08:38:49AM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 07, 2024 at 02:30:55PM -0500, Bjorn Helgaas wrote:
> > "Correctly" is not quite the right word here; it's just a fact that
> > the ACS settings determined at boot time result in certain IOMMU
> > groups. If the user desires different groups, it's not that something
> > is "incorrect"; it's just that the user may have to accept less
> > isolation to get the desired IOMMU groups.
>
> That is not quite accurate.. There are HW configurations where ACS
> needs to be a certain way for the HW to work with P2P at all. It isn't
> just an optimization or the user accepts something, if they want P2P
> at all they must get a ACS configuration appropriate for their system.

The current wording of "For iommu_groups to form correctly, the ACS
settings in the PCIe fabric need to be setup early" suggests that the
way we currently configure ACS is incorrect in general, regardless of
P2PDMA.

But my impression is that there's a trade-off between isolation and
the ability to do P2PDMA, and users have different requirements, and
the preference for less isolation/more P2PDMA is no more "correct"
than a preference for more isolation/less P2PDMA.

The kernel-parameters doc mentions the reduced isolation idea, but I
think we need a little more guidance for users. It's probably too
much detail for kernel-parameters, but the commit log would be a good
place.

Maybe something like this:

PCIe ACS settings determine how devices are put into iommu_groups.
The iommu_groups in turn determine which devices can be passed
through to VMs and whether P2PDMA between them is possible. The
iommu_groups are built at enumeration-time and are currently static.

Add a kernel command-line option to change ACS settings for specific
devices, which allows more devices to be put in the same
iommu_group, at the cost of reduced isolation between them.

ACS applies to PCIe Downstream Ports and multi-function devices.
The default ACS settings are XXX and cause devices below an
ACS-capable port to be put in an iommu_group isolated from P2PDMA
from outside the group.

Disabling ACS XXX at a port allows ... downstream devices to be
included in the same iommu_group as ...

[I don't know exactly how this works, so please make it make sense].