Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

From: Thomas Gleixner
Date: Mon Nov 09 2020 - 06:21:27 EST


On Sun, Nov 08 2020 at 15:58, Ashok Raj wrote:
> On Sun, Nov 08, 2020 at 07:47:24PM +0100, Thomas Gleixner wrote:
>>
>>
>> Now if we look at the virtualization scenario and device hand through
>> then the structure in the guest view is not any different from the basic
>> case. This works with PCI-MSI[X] and the IDXD IMS variant because the
>> hypervisor can trap the access to the storage and translate the message:
>>
>> |
>> |
>> [CPU] -- [Bri | dge] -- Bus -- [Device]
>> |
>> Alloc +
>> Compose Store Use
>> |
>> | Trap
>> v
>> Hypervisor translates and stores
>>
>
> The above case, VMM is responsible for writing to the message
> store. In both cases if its IMS or Legacy MSI/MSIx. VMM handles
> the writes to the device interrupt region and to the IRTE tables.

Yes, but that's just how it's done today and there is no real need to do
so.

>> Now the question which I can't answer is whether this can work correctly
>> in terms of isolation. If the IMS storage is in guest memory (queue
>> storage) then the guest driver can obviously write random crap into it
>> which the device will happily send. (For MSI and IDXD style IMS it
>> still can trap the store).
>
> The isolation problem is not just the guest memory being used as interrrupt
> store right? If the Store to device region is not trapped and controlled by
> VMM, there is no gaurantee the guest OS has done the right thing?
>
> Thinking about it, guest memory might be more problematic since its not
> trappable and VMM can't enforce what is written. This is something that
> needs more attension. But for now the devices supporting memory on device
> the trap and store by VMM seems to satisfy the security properties you
> highlight here.

That's not the problem at all. The VMM is not responsible for the
correctness of the guest OS at all. All the VMM cares about is that the
guest cannot access anything which does not belong to the guest.

If the guest OS screws up the message (by stupidity or malice), then the
MSI sent from the passed through device has to be caught by the
IOMMU/remap unit if an _only_ if it writes to something which it is not
allowed to.

If it overwrites the guests memory then so be it. The VMM cannot prevent
the guest OS doing so by a stray pointer either. So why would it worry
about the MSI going into guest owned lala land?

>> Is the IOMMU/Interrupt remapping unit able to catch such messages which
>> go outside the space to which the guest is allowed to signal to? If yes,
>> problem solved. If no, then IMS storage in guest memory can't ever work.
>
> This can probably work for SRIOV devices where guest owns the entire device.
> interrupt remap does have RID checks if interrupt arrives at an Interrupt handle
> not allocated for that BDF.
>
> But for SIOV devices there is no PASID filtering at the remap level since
> interrupt messages don't carry PASID in the TLP.

PASID is irrelevant here.

If the device sends a message then the remap unit will see the requester
ID of the device and if the message it sends is not matching the remap
tables then it's caught and the guest is terminated. At least that's how
it should be.

>> But there's a catch:
>>
>> This only works when the guest OS actually knows that it runs in a
>> VM. If the guest can't figure that out, i.e. via CPUID, this cannot be
>
> Precicely!. It might work if the OS is new, but for legacy the trap-emulate
> seems both safe and works for legacy as well?

Again, trap emulate does not work for IMS when the IMS store is software
managed guest memory and not part of the device. And that's the whole
reason why we are discussing this.

Thanks,

tglx