Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices
From: Vasant Hegde
Date: Thu Apr 18 2024 - 07:45:21 EST
Jason,
On 4/17/2024 9:36 PM, Jason Gunthorpe wrote:
> On Wed, Apr 17, 2024 at 10:53:00AM +0530, Vasant Hegde wrote:
>>> - DART
>>> * system page size is too small, can't map IOPTEs, force identity
>>> * iommu does not support IDENTITY at all, force paging
>>> - tegra: Device quirks mean paging and DMA API doesn't work
>>> - amd: The driver can't support PAGING when in SNP mode
>>
>> Actually When SNP (Secure Nested Paging) is enabled in host, AMD driver forces
>> DMA translation mode with AMD V1 page table.
>
> Why does it return IDENTITY?
We return IDENTITY only if SNP and memory encryption is not enabled and device
is SVA capable. Upstream has below code (v6.9-rc4)
if (pdev_pasid_supported(dev_data) &&
!cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
!amd_iommu_snp_en) {
return IOMMU_DOMAIN_IDENTITY;
}
and during boot, we will enforce Paging domain when Encryption/SNP is enabled.
>
> if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT) && !amd_iommu_snp_en)
> return IOMMU_DOMAIN_IDENTITY;
>
> If the HW can't support IDENTITY then it needs to return
> IOMMU_DOMAIN_DMA..
>
>>> Since actual PASID AMD attach isn't implemented yet we could just
>>> remove that check from def_domain_type as an RC fix. Vasant can sort
>>> it out properly by disabling pasid support on untrusted devices until
>>> the DTE logic is fully completed.
>>
>> Keeping PASID support aside, largely the question is who should handle/decide
>> domain type for untrusted device? Is it core IOMMU layer -OR- HW
>> driver?
>
> def_domain_type() should return a HW limitation. It has got muddled up
> with the default domain type stuff, but in essence it's usage right
> now is to tell the core code that a specific device cannot support
> IDENTITY or PAGING. It should be combined with a matching test in
> attach_dev to block the unsupported domain combination.
>
> The core code should use that information to decice on policy. If the
> device reports it cannot support PAGING and the device is untrusted
> then, IMHO the core code should leave the device blocked and
> inoperable for security.
>
> The key thing is this API should not be used for the driver to inject
> some kind policy/optimization choice.
So far the policy was core layer to handle untrusted device which got changed
and hence we hit this bug.
I don't see any issue with enforcing these checks inside AMD driver.. If we go
with this approach then IMO core should just adhere to what driver returned
rather than failing. Having policy decision at two level is inviting trouble
like this one.
>
>> - If its individual driver, then I can try to add fix in AMD driver. But then
>> what is the expectation? Driver is expected to return IOMMU_DOMAIN_DMA -OR- core
>> IOMMU layer is expected to adhere to whatever driver returned?
>
> If driver and HW supports PAGING and IDENTITY attaches to the devices
> RID then it should return 0.
>
> If only PAGING is supported it should return
> IOMMU_DOMAIN_DMA. alloc_domain_paging(dev) should select the correct
> page table format - including a format to enable features like PASID.
>
> If only IDENTITY is supported it should return IOMMU_DOMAIN_IDENTITY.
>
> If AMD SW cannot support PASID & RID PAGING right now then it should
> return 0 and fail that attachment combination during set_dev_pasid().
>
> In the interm you should instruct users to use the command line option
> to force IDENTITY until the driver work is completed.
This is a regression hence it should be fixed. All other enhancement/fixes can
be done later.
-Vasant
>
> If there are embedded GPUs or something that have an implementation
> that bypasses the IOMMU for DMA then you need an ID list to check
> against like some of the ARM drivers do.
>
> Jason