Re: [PATCH v3 4/6] iommu: Move lock from iommu_change_dev_def_domain() to its caller

From: Jason Gunthorpe
Date: Thu Mar 09 2023 - 20:17:01 EST


On Mon, Mar 06, 2023 at 10:58:02AM +0800, Lu Baolu wrote:
> The intention is to make it possible to put group ownership check and
> default domain change in a same critical region protected by the group's
> mutex lock. No intentional functional change.
>
> Signed-off-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> ---
> drivers/iommu/iommu.c | 29 ++++++++++++++---------------
> 1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0bcd9625090d..f8f400548a10 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2945,7 +2945,7 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
> int ret, dev_def_dom;
> struct device *dev;
>
> - mutex_lock(&group->mutex);
> + lockdep_assert_held(&group->mutex);
>
> if (group->default_domain != group->domain) {
> dev_err_ratelimited(prev_dev, "Group not assigned to default domain\n");
> @@ -3033,28 +3033,15 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
> goto free_new_domain;
>
> group->domain = group->default_domain;
> -
> - /*
> - * Release the mutex here because ops->probe_finalize() call-back of
> - * some vendor IOMMU drivers calls arm_iommu_attach_device() which
> - * in-turn might call back into IOMMU core code, where it tries to take
> - * group->mutex, resulting in a deadlock.
> - */
> - mutex_unlock(&group->mutex);
> -
> - /* Make sure dma_ops is appropriatley set */
> - iommu_group_do_probe_finalize(dev, group->default_domain);
> iommu_domain_free(prev_dom);
> +
> return 0;
>
> free_new_domain:
> iommu_domain_free(group->default_domain);
> group->default_domain = prev_dom;
> group->domain = prev_dom;
> -
> out:
> - mutex_unlock(&group->mutex);
> -
> return ret;
> }
>
> @@ -3142,7 +3129,19 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
> goto out;
> }
>
> + mutex_lock(&group->mutex);
> ret = iommu_change_dev_def_domain(group, dev, req_type);
> + /*
> + * Release the mutex here because ops->probe_finalize() call-back of
> + * some vendor IOMMU drivers calls arm_iommu_attach_device() which
> + * in-turn might call back into IOMMU core code, where it tries to take
> + * group->mutex, resulting in a deadlock.
> + */
> + mutex_unlock(&group->mutex);
> +
> + /* Make sure dma_ops is appropriatley set */
> + if (!ret)
> + iommu_group_do_probe_finalize(dev, group->default_domain);

Everything about iommu_group_do_probe_finalize() is still unsafe
against races with release. :(

Pre-existing bug so maybe leave it for this series :\

To fix it I'd suggest splitting probe_finalize ops into probe_finalize
and probe_finalized_unlocked.

Only have the "bad" deadlocky drivers use the unlocked variant and fix
intel and virtio to use the safe varient.

We can decide which variant to use under the mutex and then at least
"good" drivers don't have this race.

Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>

Jason