Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper

From: Eric Auger
Date: Fri Mar 10 2023 - 03:44:30 EST


Hi,

On 3/9/23 13:51, Robin Murphy wrote:
> On 2023-03-09 10:53, Nicolin Chen wrote:
>> The nature of ITS virtualization on ARM is done via hypercalls, so
>> kernel
>> handles all IOVA mappings for the MSI doorbell in
>> iommu_dma_prepare_msi()
>> and iommu_dma_compose_msi_msg(). The current virtualization solution
>> with
>> a 2-stage nested translation setup is to do 1:1 IOVA mappings at stage-1
>> guest-level IO page table via a RMR region in guest-level IORT, aligning
>> with an IOVA region that's predefined and mapped in the host kernel:
>>
>>    [stage-2 host level]
>>    #define MSI_IOVA_BASE        0x8000000
>>    #define MSI_IOVA_LENGTH    0x100000
>>    ...
>>    iommu_get_msi_cookie():
>>     cookie->msi_iova = MSI_IOVA_BASE;
>>    ...
>>    iommu_dma_prepare_msi(its_pa):
>>     domain = iommu_get_domain_for_dev(dev);
>>     iommu_dma_get_msi_page(its_pa, domain):
>>         cookie = domain->iova_cookie;
>>         iova = iommu_dma_alloc_iova():
>>             return cookie->msi_iova - size;
>>         iommu_map(iova, its_pa, ...);
>>
>>    [stage-1 guest level]
>>    // Define in IORT a RMR [MSI_IOVA_BASE, MSI_IOVA_LENGTH]
>>    ...
>>    iommu_create_device_direct_mappings():
>>     iommu_map(iova=MSI_IOVA_BASE, pa=MSI_IOVA_BASE,
>> len=MSI_IOVA_LENGTH);
>>
>> This solution calling iommu_get_domain_for_dev() needs the device to get
>> attached to a host-level iommu_domain that has the msi_cookie.
>>
>> On the other hand, IOMMUFD designs two iommu_domain objects to represent
>> the two stages: a stage-1 domain (IOMMU_DOMAIN_NESTED type) and a
>> stage-2
>> domain (IOMMU_DOMAIN_UNMANAGED type). In this design, the device will be
>> attached to the stage-1 domain representing a guest-level IO page table,
>> or a Context Descriptor Table in SMMU's term.
>>
>> This is obviously a mismatch, as the iommu_get_domain_for_dev() does not
>> return the correct domain pointer in iommu_dma_prepare_msi().
>>
>> Add an iommu_get_unmanaged_domain helper to allow drivers to return the
>> correct IOMMU_DOMAIN_UNMANAGED iommu_domain having the IOVA mappings for
>> the msi_cookie. Keep it in the iommu-priv header for internal use only.
>>
>> Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
>> Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
>> ---
>>   drivers/iommu/dma-iommu.c  |  5 +++--
>>   drivers/iommu/iommu-priv.h | 15 +++++++++++++++
>>   include/linux/iommu.h      |  2 ++
>>   3 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 99b2646cb5c7..6b0409d0ff85 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -31,6 +31,7 @@
>>   #include <linux/vmalloc.h>
>>     #include "dma-iommu.h"
>> +#include "iommu-priv.h"
>>     struct iommu_dma_msi_page {
>>       struct list_head    list;
>> @@ -1652,7 +1653,7 @@ static struct iommu_dma_msi_page
>> *iommu_dma_get_msi_page(struct device *dev,
>>   int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>>   {
>>       struct device *dev = msi_desc_to_dev(desc);
>> -    struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +    struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
>
> This still doesn't make sense - most of the time this will be expected
> to return the default DMA/identity domain if that's what the device is
> currently using. We can't know whether the current domain is managed
> or not until we look at it.

I tend to agree with Robin here. This was first introduced by

[PATCH v7 21/22] iommu/dma: Add support for mapping MSIs <https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@xxxxxxx/#r>
https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@xxxxxxx/

even before the support un VFIO use case which came later on. So using the "unmanaged" terminology sounds improper to me, at least.
Couldn't we use a parent/child terminology as used in the past in
[RFC v2] /dev/iommu uAPI proposal <https://lore.kernel.org/all/BN9PR11MB5433B1E4AE5B0480369F97178C189@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/#r>

This would still hold for the former use case.

Thanks

Eric



>
> Just like every other caller of iommu_get_domain_for_dev(), what we
> want here is the current kernel-owned domain that we can inspect and
> maybe do standard IOMMU API things with. Why can't
> iommu_get_domain_for_dev() simply maintain that established usage
> model and return the kernel-owned s2_domain from a nested domain
> automatically? No IOMMU API user expects or needs it to return
> anything else (and IOMMUFD should certainly not be losing track of a
> nested domain within its own higher-level abstractions and needing to
> fall back on iommu_get_domain_for_dev()), so I really don't see a
> valid reason to overcomplicate things.
>
> Please note I stress "valid" since I'm not buying arbitrarily made-up
> conceptual purity arguments. A nested domain cannot be the "one true
> domain" that is an opaque combination of S1+S2; the IOMMU API view has
> to be more like the device is attached to both the nested domain and
> the parent stage 2 domain somewhat in parallel. Even when nesting is
> active, the S2 domain still exists as a domain in its own right, and
> still needs to be visible and operated on as such, for instance if
> memory is hotplugged in or out of the VM.
>
> TBH I'd also move the s2_domain pointer into the iommu_domain itself,
> since it's going to be a common feature for all nesting
> implementations, thus there seems little need to indirect lookups
> through the drivers at all.
>
> Thanks,
> Robin.
>
>>       struct iommu_dma_msi_page *msi_page;
>>       static DEFINE_MUTEX(msi_prepare_lock); /* see below */
>>   @@ -1685,7 +1686,7 @@ int iommu_dma_prepare_msi(struct msi_desc
>> *desc, phys_addr_t msi_addr)
>>   void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct
>> msi_msg *msg)
>>   {
>>       struct device *dev = msi_desc_to_dev(desc);
>> -    const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +    const struct iommu_domain *domain =
>> iommu_get_unmanaged_domain(dev);
>>       const struct iommu_dma_msi_page *msi_page;
>>         msi_page = msi_desc_get_iommu_cookie(desc);
>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
>> index a6e694f59f64..da8044da9ad8 100644
>> --- a/drivers/iommu/iommu-priv.h
>> +++ b/drivers/iommu/iommu-priv.h
>> @@ -15,6 +15,21 @@ static inline const struct iommu_ops
>> *dev_iommu_ops(struct device *dev)
>>       return dev->iommu->iommu_dev->ops;
>>   }
>>   +static inline struct iommu_domain
>> *iommu_get_unmanaged_domain(struct device *dev)
>> +{
>> +    const struct iommu_ops *ops;
>> +
>> +    if (!dev->iommu || !dev->iommu->iommu_dev)
>> +        goto attached_domain;
>> +
>> +    ops = dev_iommu_ops(dev);
>> +    if (ops->get_unmanaged_domain)
>> +        return ops->get_unmanaged_domain(dev);
>> +
>> +attached_domain:
>> +    return iommu_get_domain_for_dev(dev);
>> +}
>> +
>>   int iommu_group_replace_domain(struct iommu_group *group,
>>                      struct iommu_domain *new_domain);
>>   diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 080278c8154d..76c65cc4fc15 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -275,6 +275,8 @@ struct iommu_ops {
>>                             struct iommu_domain *parent,
>>                             const void *user_data);
>>   +    struct iommu_domain *(*get_unmanaged_domain)(struct device *dev);
>> +
>>       struct iommu_device *(*probe_device)(struct device *dev);
>>       void (*release_device)(struct device *dev);
>>       void (*probe_finalize)(struct device *dev);
>