Re: arm64 iommu groups issue
From: John Garry
Date:  Mon Feb 17 2020 - 07:08:53 EST
Right, and even worse is that it relies on the port driver even 
existing at all.
All this iommu group assignment should be taken outside device driver 
probe paths.
However we could still consider device links for sync'ing the SMMU and 
each device probing.
Yes, we should get that for DT now thanks to the of_devlink stuff, but 
cooking up some equivalent for IORT might be worthwhile.
It doesn't solve this problem, but at least we could remove the 
iommu_ops check in iort_iommu_xlate().
We would need to carve out a path from pci_device_add() or even 
device_add() to solve all cases.
Another thought that crosses my mind is that when pci_device_group()
walks up to the point of ACS isolation and doesn't find an existing
group, it can still infer that everything it walked past *should* be put
in the same group it's then eventually going to return. Unfortunately I
can't see an obvious way for it to act on that knowledge, though, since
recursive iommu_probe_device() is unlikely to end well.
[...]
And this looks to be the reason for which current 
iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also.
Of course, just adding a 'correct' add_device replay without the 
of_xlate process doesn't help at all. No wonder this looked suspiciously 
simpler than where the first idea left off...
(on reflection, the core of this idea seems to be recycling the existing 
iommu_bus_init walk rather than building up a separate "waiting list", 
while forgetting that that wasn't the difficult part of the original 
idea anyway)
We could still use a bus walk to add the group per iommu, but we would 
need an additional check to ensure the device is associated with the IOMMU.
On this current code mentioned, the principle of this seems wrong to 
me - we call bus_for_each_device(..., add_iommu_group) for the first 
SMMU in the system which probes, but we attempt to add_iommu_group() 
for all devices on the bus, even though the SMMU for that device may 
yet to have probed.
Yes, iommu_bus_init() is one of the places still holding a 
deeply-ingrained assumption that the ops go live for all IOMMU instances 
at once, which is what warranted the further replay in 
of_iommu_configure() originally. Moving that out of 
of_platform_device_create() to support probe deferral is where the 
trouble really started.
I'm not too familiar with the history here, but could this be reverted 
now with the introduction of of_devlink stuff?
Cheers,
John