Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection
From: Raj, Ashok
Date: Wed Nov 11 2020 - 20:45:21 EST
On Wed, Nov 11, 2020 at 11:27:28PM +0100, Thomas Gleixner wrote:
> On Wed, Nov 11 2020 at 08:09, Ashok Raj wrote:
> >> > We'd also need a way for an OS running on bare metal to *know* that
> >> > it's on bare metal and can just compose MSI messages for itself. Since
> >> > we do expect bare metal to have an IOMMU, perhaps that is just a
> >> > feature flag on the IOMMU?
> >>
> >> Have the platform firmware advertise if it needs native or virtualized
> >> IMS handling. If it advertises neither don't support IMS?
> >
> > The platform hint can be easily accomplished via DMAR table flags. We could
> > have an IMS_OPTOUT(similart to x2apic optout flag) flag, when 0 its native
> > and IMS is supported.
> >
> > When vIOMMU is presented to guest, virtual DMAR table will have this flag
> > set to 1. Indicates to GuestOS, native IMS isn't supported.
>
> These opt-out bits suck by definition. It comes all back to the fact
> that the whole virt thing didn't have a hardware defined way to tell
> that the OS runs in a VM and not on bare metal. It wouldn't have been
> rocket science to do so.
I'm sure everybody dislikes (hate being a strong word :-)).
DVSEC capability. Real hardware always sets it to 1 for the IMS capability.
By default the DVSEC is not presented to guest even when the full PF is
presented to guest. I believe VFIO only builds and presents known standard
capabilities and specific extended capabilities. I'm a bit weak but maybe
@AlexWilliamson can confirm if I'm off track.
This tells the driver in guest that IMS is not available and will not
create those new dev_msi calls.
Only if the VMM has build support to expose IMS for this device, guest SW
can even see DVSEC.SIOV.IMS=1. This also means the required plumbing, say
vIOMMU, or a hypercall has been provisioned, and adminstrator knows the
guest is compatible for these options.
There maybe better ways to do this. If this has to be done differently
we certainly can and will do.
>
> And because that does not exist, we need magic opt-out bits for every
> other piece of functionality which gets added. Can we please stop this
> and provide a well defined way to tell the OS whether it runs on bare
> metal or not?
>
> The point is that you really want opt-in bits so that decisions come
> down to
How would we opt-in when the feature is not available? You need someway to
tell the capability is available in the guest?, but then there is no reason
to opt-in though.. its ready for use isn't it?
>
> if (!virt || virt->supports_X)
The only closest thing that comes to mind is the CPUID bits, you had
mentioned they aren't reliable if the VMM didn't set those in an earlier
mail. If you want a platform level generic support.
- DMAR table optout's you had mentioned that's ugly
- We could use caching mode, but its not a platform level thing, and vendor
specific. I'm not sure if other vendors have a similar feature. If there
is a generic capabilty, we could expose via the iommu api's if we are in
virt or real platform.
>
> which is the obvious sane and safe logic. But sure, why am I asking for
> sane and safe in the context of virtualization?
We can pick how to solve this, and just waiting for you to tell, what
mechanism you prefer that's less painful and architecturally acceptible for
virtualization and linux. We are all ears!
Cheers,
Ashok