Re: [PATCH 1/4] iommu/amd: Introduce Protection-domain flag VFIO

From: Kalra, Ashish
Date: Fri Jan 20 2023 - 12:01:36 EST


Hello Jason,

On 1/20/2023 10:13 AM, Jason Gunthorpe wrote:
On Fri, Jan 20, 2023 at 09:12:26AM -0600, Kalra, Ashish wrote:
On 1/19/2023 11:44 AM, Jason Gunthorpe wrote:
On Thu, Jan 19, 2023 at 02:54:43AM -0600, Kalra, Ashish wrote:
Hello Jason,

On 1/13/2023 9:33 AM, Jason Gunthorpe wrote:
On Tue, Jan 10, 2023 at 08:31:34AM -0600, Suravee Suthikulpanit wrote:
Currently, to detect if a domain is enabled with VFIO support, the driver
checks if the domain has devices attached and check if the domain type is
IOMMU_DOMAIN_UNMANAGED.

NAK

If you need weird HW specific stuff like this then please implement it
properly in iommufd, not try and randomly guess what things need from
the domain type.

All this confidential computing stuff needs a comprehensive solution,
not some piecemeal mess. How can you even use a CC guest with VFIO in
the upstream kernel? Hmm?


Currently all guest devices are untrusted - whether they are emulated,
virtio or passthrough. In the current use case of VFIO device-passthrough to
an SNP guest, the pass-through device will perform DMA to un-encrypted or
shared guest memory, in the same way as virtio or emulated devices.

This fix is prompted by an issue reported by Nvidia, they are trying to do
PCIe device passthrough to SNP guest. The memory allocated for DMA is
through dma_alloc_coherent() in the SNP guest and during DMA I/O an
RMP_PAGE_FAULT is observed on the host.

These dma_alloc_coherent() calls map into page state change hypercalls into
the host to change guest page state from encrypted to shared in the RMP
table.

Following is a link to issue discussed above:
https://github.com/AMDESE/AMDSEV/issues/109

Wow you should really write all of this in the commmit message

Now, to set individual 4K entries to different shared/private
mappings in NPT or host page tables for large page entries, the RMP
and NPT/host page table large page entries are split to 4K pte’s.

Why are mappings to private pages even in the iommu in the first
place - and how did they even get there?


You seem to be confusing between host/NPT page tables and IOMMU page tables.

No, I haven't. I'm repeating what was said:

during DMA I/O an RMP_PAGE_FAULT is observed on the host.

So, I'm interested to hear how you can get a RMP_PAGE_FAULT from the
IOMMU if the IOMMU is only programmed with shared pages that, by (my)
definition, are accessible to the CPU and should not generate a
RMP_PAGE_FAULT?

Yes, sorry i got confused with your use of the word private as you mention below.

We basically get the RMP #PF from the IOMMU because there is a page size mismatch between the RMP table and the IOMMU page table. The RMP table's large page entry has been smashed to 4K PTEs to handle page state change to shared on 4K mappings, so this change has to be synced up with the IOMMU page table, otherwise there is now a page size mismatch between RMP table and IOMMU page table which causes the RMP #PF.

Thanks,
Ashish


I think you are confusing my use of the word private with some AMD
architecture deatils. When I say private I mean that the host CPU will
generate a violation if it tries to access the memory.

I think the conclusion is logical - if the IOMMU is experiencing a
protection violation it is because the IOMMU was programed with PFNs
it is not allowed to access - and so why was that even done in the
first place?

I suppose what is going on is you program the IOPTEs with PFNs of
unknown state and when the PFN changes access protections the IOMMU
can simply use it without needing to synchronize with the access
protection change. And your problem is that the granularity of access
protection change does not match the IOPTE granularity in the IOMMU.

But this seems very wasteful as the IOMMU will be using IOPTEs and
also will pin the memory when the systems *knows* this memory cannot
be accessed through the IOMMU. It seems much better to dynamically
establish IOMMU mappings only when you learn that the memory is
actually accesisble to the IOMMU.

Also, I thought the leading plan for CC was to use the memfd approach here:

https://lore.kernel.org/kvm/20220915142913.2213336-1-chao.p.peng@xxxxxxxxxxxxxxx/

Which prevents mmaping the memory to userspace - so how did it get
into the IOMMU in the first place?

Jason