Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device

From: Sven Peter
Date: Thu Oct 21 2021 - 04:31:45 EST




On Wed, Oct 20, 2021, at 07:21, Lu Baolu wrote:
> On 2021/10/20 0:37, Sven Peter via iommu wrote:
>> The iova allocator is capable of handling any granularity which is a power
>> of two. Remove the much stronger condition that the granularity must be
>> smaller or equal to the CPU page size from a BUG_ON there.
>> Instead, check this condition during __iommu_attach_device and fail
>> gracefully.
>>
>> Signed-off-by: Sven Peter<sven@xxxxxxxxxxxxx>
>> ---
>> drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
>> drivers/iommu/iova.c | 7 ++++---
>> include/linux/iommu.h | 5 +++++
>> 3 files changed, 41 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index dd7863e453a5..28896739964b 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -80,6 +80,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>> unsigned type);
>> static int __iommu_attach_device(struct iommu_domain *domain,
>> struct device *dev);
>> +static void __iommu_detach_device(struct iommu_domain *domain,
>> + struct device *dev);
>> static int __iommu_attach_group(struct iommu_domain *domain,
>> struct iommu_group *group);
>> static void __iommu_detach_group(struct iommu_domain *domain,
>> @@ -1974,6 +1976,19 @@ void iommu_domain_free(struct iommu_domain *domain)
>> }
>> EXPORT_SYMBOL_GPL(iommu_domain_free);
>>
>> +static int iommu_check_page_size(struct iommu_domain *domain)
>> +{
>> + if (!iommu_is_paging_domain(domain))
>> + return 0;
>> +
>> + if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1)))) {
>> + pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
>> + return -EFAULT;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int __iommu_attach_device(struct iommu_domain *domain,
>> struct device *dev)
>> {
>> @@ -1983,9 +1998,23 @@ static int __iommu_attach_device(struct iommu_domain *domain,
>> return -ENODEV;
>>
>> ret = domain->ops->attach_dev(domain, dev);
>> - if (!ret)
>> - trace_attach_device_to_domain(dev);
>> - return ret;
>> + if (ret)
>> + return ret;
>> +
>> + /*
>> + * Check that CPU pages can be represented by the IOVA granularity.
>> + * This has to be done after ops->attach_dev since many IOMMU drivers
>> + * only limit domain->pgsize_bitmap after having attached the first
>> + * device.
>> + */
>> + ret = iommu_check_page_size(domain);
>> + if (ret) {
>> + __iommu_detach_device(domain, dev);
>> + return ret;
>> + }
>
> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> device. How does it relate to CPU pages? Why is it a failure case if CPU
> page size is not covered?

Ideally, I'd only allow allocating DMA domains (which are going to be able
to handle larger IOMMU page sizes) while disallowing UNMANAGED domains
(which can theoretically read the granule but I doubt any client right now
considers this situation and will just run into odd issues) when the I/O
page size is bigger than the CPU page size. There was a brief previous
discussion about this [1,2,3].

Unfortunately, Apple's DART IOMMU is hardwired to either support 4K or
16K pages but never both. And to make things worse there was at least one
SoC used in the iPhones that mixed 4K and 16K DARTs on the same bus. Ugh.
That's why this awkward check is here because this is the earliest
place where I know which I/O page size will be used.


But I guess I could just limit the DART driver to 16K pages for now
(since every instance on the M1 is hard wired for that anyway) and then
just disallow allocating UNMANAGED domains when granule > PAGE_SIZE.

I'd still need a similar check here (at least for now) to prevent attaching
untrusted devices since I haven't changed swiotlb yet to support aligning
buffers correctly to more than PAGE_SIZE. That's possible but the interaction
with min_align_mask is a bit tricky to get right.
If there really shouldn't be any check here I can also do that for the next
version but I'd really like to keep that as a separate series - especially
since Thunderbolt support is still far away.


Thanks,


Sven

[1] https://lore.kernel.org/linux-iommu/7261df01-34a9-4e53-37cd-ae1aa15b1fb4@xxxxxxx/
[2] https://lore.kernel.org/linux-iommu/CAK8P3a18XK2mfMGbZ+M32Mbabhbkd+=DNrnzampOah_j=rWozw@xxxxxxxxxxxxxx/
[3] https://lore.kernel.org/linux-iommu/YO%2FbMUAOLRgOJOvl@xxxxxxxxxx/