Re: [PATCH] iommu: Relax ACS requirement for RCiEP devices.
From: Alex Williamson
Date: Tue May 05 2020 - 10:05:27 EST
On Mon, 4 May 2020 23:11:07 -0700
"Raj, Ashok" <ashok.raj@xxxxxxxxx> wrote:
> Hi Alex
>
> + Joerg, accidently missed in the Cc.
>
> On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote:
> > On Mon, 4 May 2020 21:42:16 -0700
> > Ashok Raj <ashok.raj@xxxxxxxxx> wrote:
> >
> > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices.
> > >
> > > PCIe 5.0 Specification.
> > > 6.12 Access Control Services (ACS)
> > > Implementation of ACS in RCiEPs is permitted but not required. It is
> > > explicitly permitted that, within a single Root Complex, some RCiEPs
> > > implement ACS and some do not. It is strongly recommended that Root Complex
> > > implementations ensure that all accesses originating from RCiEPs
> > > (PFs and VFs) without ACS capability are first subjected to processing by
> > > the Translation Agent (TA) in the Root Complex before further decoding and
> > > processing. The details of such Root Complex handling are outside the scope
> > > of this specification.
> > >
> > > Since Linux didn't give special treatment to allow this exception, certain
> > > RCiEP MFD devices are getting grouped in a single iommu group. This
> > > doesn't permit a single device to be assigned to a guest for instance.
> > >
> > > In one vendor system: Device 14.x were grouped in a single IOMMU group.
> > >
> > > /sys/kernel/iommu_groups/5/devices/0000:00:14.0
> > > /sys/kernel/iommu_groups/5/devices/0000:00:14.2
> > > /sys/kernel/iommu_groups/5/devices/0000:00:14.3
> > >
> > > After the patch:
> > > /sys/kernel/iommu_groups/5/devices/0000:00:14.0
> > > /sys/kernel/iommu_groups/5/devices/0000:00:14.2
> > > /sys/kernel/iommu_groups/6/devices/0000:00:14.3 <<< new group
> > >
> > > 14.0 and 14.2 are integrated devices, but legacy end points.
> > > Whereas 14.3 was a PCIe compliant RCiEP.
> > >
> > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> > >
> > > This permits assigning this device to a guest VM.
> > >
> > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> > > Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx>
> > > To: Joerg Roedel <joro@xxxxxxxxxx>
> > > To: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > > Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > > Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> > > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Cc: Darrel Goeddel <DGoeddel@xxxxxxxxxxxxxx>
> > > Cc: Mark Scott <mscott@xxxxxxxxxxxxxx>,
> > > Cc: Romil Sharma <rsharma@xxxxxxxxxxxxxx>
> > > Cc: Ashok Raj <ashok.raj@xxxxxxxxx>
> > > ---
> > > drivers/iommu/iommu.c | 15 ++++++++++++++-
> > > 1 file changed, 14 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 2b471419e26c..5744bd65f3e2 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -1187,7 +1187,20 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev,
> > > struct pci_dev *tmp = NULL;
> > > struct iommu_group *group;
> > >
> > > - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))
> > > + /*
> > > + * PCI Spec 5.0, Section 6.12 Access Control Service
> > > + * Implementation of ACS in RCiEPs is permitted but not required.
> > > + * It is explicitly permitted that, within a single Root
> > > + * Complex, some RCiEPs implement ACS and some do not. It is
> > > + * strongly recommended that Root Complex implementations ensure
> > > + * that all accesses originating from RCiEPs (PFs and VFs) without
> > > + * ACS capability are first subjected to processing by the Translation
> > > + * Agent (TA) in the Root Complex before further decoding and
> > > + * processing.
> > > + */
> >
> > Is the language here really strong enough to make this change? ACS is
> > an optional feature, so being permitted but not required is rather
> > meaningless. The spec is also specifically avoiding the words "must"
> > or "shall" and even when emphasized with "strongly", we still only have
> > a recommendation that may or may not be honored. This seems like a
> > weak basis for assuming that RCiEPs universally honor this
> > recommendation. Thanks,
> >
>
> We are speaking about PCIe spec, where people write it about 5 years ahead
> and every vendor tries to massage their product behavior with vague
> words like this.. :)
>
> But honestly for any any RCiEP, or even integrated endpoints, there
> is no way to send them except up north. These aren't behind a RP.
But they are multi-function devices and the spec doesn't define routing
within multifunction packages. A single function RCiEP will already be
assumed isolated within its own group.
> I did check with couple folks who are part of the SIG, and seem to agree
> that ACS treatment for RCiEP's doesn't mean much.
>
> I understand the language isn't strong, but it doesn't seem like ACS should
> be a strong requirement for RCiEP's and reasonable to relax.
>
> What are your thoughts?
I think hardware vendors have ACS at their disposal to clarify when
isolation is provided, otherwise vendors can submit quirks, but I don't
see that the "strongly recommended" phrasing is sufficient to assume
isolation between multifunction RCiEPs. Thanks,
Alex