RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

From: Tian, Kevin
Date: Tue Apr 07 2020 - 20:27:31 EST


> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Tuesday, April 7, 2020 11:58 PM
>
> On Tue, 7 Apr 2020 04:26:23 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
> > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Sent: Saturday, April 4, 2020 1:26 AM
> > [...]
> > > > > > + if (!pasid_cap.control_reg.paside) {
> > > > > > + pr_debug("%s: its PF's PASID capability is not
> enabled\n",
> > > > > > + dev_name(&vdev->pdev->dev));
> > > > > > + ret = 0;
> > > > > > + goto out;
> > > > > > + }
> > > > >
> > > > > What happens if the PF's PASID gets disabled while we're using it??
> > > >
> > > > This is actually the open I highlighted in cover letter. Per the reply
> > > > from Baolu, this seems to be an open for bare-metal all the same.
> > > > https://lkml.org/lkml/2020/3/31/95
> > >
> > > Seems that needs to get sorted out before we can expose this. Maybe
> > > some sort of registration with the PF driver that PASID is being used
> > > by a VF so it cannot be disabled?
> >
> > I guess we may do vSVA for PF first, and then adding VF vSVA later
> > given above additional need. It's not necessarily to enable both
> > in one step.
> >
> > [...]
> > > > > > @@ -1604,6 +1901,18 @@ static int vfio_ecap_init(struct
> > > vfio_pci_device *vdev)
> > > > > > if (!ecaps)
> > > > > > *(u32 *)&vdev->vconfig[PCI_CFG_SPACE_SIZE] = 0;
> > > > > >
> > > > > > +#ifdef CONFIG_PCI_ATS
> > > > > > + if (pdev->is_virtfn) {
> > > > > > + struct pci_dev *physfn = pdev->physfn;
> > > > > > +
> > > > > > + ret = vfio_pci_add_emulated_cap_for_vf(vdev,
> > > > > > + physfn, epos_max, prev);
> > > > > > + if (ret)
> > > > > > + pr_info("%s, failed to add special caps for
> VF %s\n",
> > > > > > + __func__, dev_name(&vdev->pdev-
> >dev));
> > > > > > + }
> > > > > > +#endif
> > > > >
> > > > > I can only imagine that we should place the caps at the same location
> > > > > they exist on the PF, we don't know what hidden registers might be
> > > > > hiding in config space.
> >
> > Is there vendor guarantee that hidden registers will locate at the
> > same offset between PF and VF config space?
>
> I'm not sure if the spec really precludes hidden registers, but the
> fact that these registers are explicitly outside of the capability
> chain implies they're only intended for device specific use, so I'd say
> there are no guarantees about anything related to these registers.
>
> FWIW, vfio started out being more strict about restricting config space
> access to defined capabilities, until...
>
> commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
> Author: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Date: Mon Apr 1 09:04:12 2013 -0600
>
> vfio-pci: Enable raw access to unassigned config space
>
> Devices like be2net hide registers between the gaps in capabilities
> and architected regions of PCI config space. Our choices to support
> such devices is to either build an ever growing and unmanageable white
> list or rely on hardware isolation to protect us. These registers are
> really no different than MMIO or I/O port space registers, which we
> don't attempt to regulate, so treat PCI config space in the same way.
>
> > > > but we are not sure whether the same location is available on VF. In
> > > > this patch, it actually places the emulated cap physically behind the
> > > > cap which lays farthest (its offset is largest) within VF's config space
> > > > as the PCIe caps are linked in a chain.
> > >
> > > But, as we've found on Broadcom NICs (iirc), hardware developers have a
> > > nasty habit of hiding random registers in PCI config space, outside of
> > > defined capabilities. I feel like IGD might even do this too, is that
> > > true? So I don't think we can guarantee that just because a section of
> > > config space isn't part of a defined capability that its unused. It
> > > only means that it's unused by common code, but it might have device
> > > specific purposes. So of the PCIe spec indicates that VFs cannot
> > > include these capabilities and virtialization software needs to
> > > emulate them, we need somewhere safe to place them in config space,
> and
> > > simply placing them off the end of known capabilities doesn't give me
> > > any confidence. Also, hardware has no requirement to make compact
> use
> > > of extended config space. The first capability must be at 0x100, the
> > > very next capability could consume all the way to the last byte of the
> > > 4K extended range, and the next link in the chain could be somewhere in
> > > the middle. Thanks,
> > >
> >
> > Then what would be a viable option? Vendor nasty habit implies
> > no standard, thus I don't see how VFIO can find a safe location
> > by itself. Also curious how those hidden registers are identified
> > by VFIO and employed with proper r/w policy today. If sort of quirks
> > are used, then could such quirk way be extended to also carry
> > the information about vendor specific safe location? When no
> > such quirk info is provided (the majority case), VFIO then finds
> > out a free location to carry the new cap.
>
> See above commit, rather than quirks we allow raw access to any config
> space outside of the capability chain. My preference for trying to
> place virtual capabilities at the same offset as the capability exists
> on the PF is my impression that the PF config space is often a template
> for the VF config space. The PF and VF are clearly not independent
> devices, they share design aspects, and sometimes drivers. Therefore
> if I was a lazy engineer trying to find a place to hide a register in
> config space (and ignoring vendor capabilities*), I'd probably put it
> in the same place on both devices. Thus if we maintain the same

We are checking internally whether this assumption makes sense at
least for Intel devices which are PASID-capable.

> capability footprint as the PF, we have a better chance of avoiding
> them. It's a gamble and maybe we're overthinking it, but this has
> always been a concern when adding virtual capabilities to a physical
> device. We can always fail over to an approach where we simply find
> free space. Thanks,

Curious how failover could be triggered in your mind. It's easy to
detect conflict with other PCI caps, but not for conflict with hidden
registers. The latter can be identified only with device specific
knowledge. Possibly in the end we may leverage Yan's vendor ops to
find a safe location...

>
> Alex
>
> * ISTR the Broadcom device implemented the hidden register in standard
> config space, which was otherwise entirely packed, ie. there was no
> room for the register to be implemented as a vendor cap.

I suppose such packed design is mostly for PF. Ideally VF is much simpler
and the requirement of hidden registers should be much fewer. Otherwise
even using same PF offset doesn't work. Long-term it is better for PCISIG
to add some recommendations, e.g. for capabilities that are shared
between PF/VF, VF config space should still reserve a range at the same
location/size as of the PF ones.

Thanks
Kevin