RE: [PATCH 1/2] iommu: Fix race condition during default domain allocation
From: Krishna Reddy
Date: Fri Jun 11 2021 - 14:30:29 EST
> > + mutex_lock(&group->mutex);
> > iommu_alloc_default_domain(group, dev);
> > + mutex_unlock(&group->mutex);
>
> It feels wrong to serialise this for everybody just to cater for systems with
> aliasing SIDs between devices.
Serialization is limited to devices in the same group. Unless devices share SID, they wouldn't be in same group.
> Can you provide some more information about exactly what the h/w
> configuration is, and the callstack which exhibits the race, please?
The failure is an after effect and is a page fault. Don't have a failure call stack here. Ashish has traced it through print messages and he can provide them.
>From the prints messages, The following was observed in page fault case:
Device1: iommu_probe_device() --> iommu_alloc_default_domain() --> iommu_group_alloc_default_domain() --> __iommu_attach_device(group->default_domain)
Device2: iommu_probe_device() --> iommu_alloc_default_domain() --> iommu_group_alloc_default_domain() --> __iommu_attach_device(group->default_domain)
Both devices(with same SID) are entering into iommu_group_alloc_default_domain() function and each one getting attached to a different group->default_domain
as the second one overwrites group->default_domain after the first one attaches to group->default_domain it has created.
SMMU would be setup to use first domain for the context page table. Whereas all the dma map/unamp requests from second device would
be performed on a domain that is not used by SMMU for context translations and IOVA (not mapped in first domain) accesses from second device lead to page faults.
-KR