Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

From: Jacob Pan
Date: Thu Jan 13 2022 - 22:06:48 EST


Hi BaoLu,

On Fri, 14 Jan 2022 08:58:53 +0800, Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
wrote:

> Hi Jacob,
>
> On 1/13/22 9:23 PM, Jacob Pan wrote:
> > During PCI bus rescan, adding new devices involve two notifiers.
> > 1. dmar_pci_bus_notifier()
> > 2. iommu_bus_notifier()
> > The current code sets #1 as low priority (INT_MIN) which resulted in #2
> > being invoked first. The result is that struct device pointer cannot be
> > found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
> > device is put under the "catch-all" IOMMU instead of the correct one.
> >
> > This could cause system hang when device TLB invalidation is sent to the
> > wrong IOMMU. Invalidation timeout error or hard lockup can be observed.
> >
> > This patch fixes the issue by setting a higher priority for
> > dmar_pci_bus_notifier. DRHD search for a new device will find the
> > correct IOMMU.
> >
> > Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
> > Reported-by: Zhang, Bernice <bernice.zhang@xxxxxxxxx>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > ---
> > drivers/iommu/intel/dmar.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index 915bff76fe96..5d07e5b89c2e 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -385,7 +385,7 @@ static int dmar_pci_bus_notifier(struct
> > notifier_block *nb,
> > static struct notifier_block dmar_pci_bus_nb = {
> > .notifier_call = dmar_pci_bus_notifier,
> > - .priority = INT_MIN,
> > + .priority = INT_MAX,
> > };
> >
> > static struct dmar_drhd_unit *
> >
>
> Nice catch! dmar_pci_bus_add_dev() should take place *before*
> iommu_probe_device(). This change enforces this with a higher notifier
> priority for dmar callback.
>
> Comparably, dmar_pci_bus_del_dev() should take place *after*
> iommu_release_device(). Perhaps we can use two notifiers, one for
> ADD_DEVICE (with .priority=INT_MAX) and the other for REMOVE_DEVICE
> (with .priority=INT_MIN)?
>

Since device_to_iommu() lookup in intel_iommu_release_device() only
checks if device is under "an" IOMMU, not "the" IOMMU. Then the remove path
order is not needed, right?

I know this is not robust, but having so many notifiers with implicit
priority is not clean either.

Perhaps, we should have explicit priority defined around iommu_bus
notifier? i.e.

@@ -1841,6 +1841,7 @@ static int iommu_bus_init(struct bus_type *bus, const
struct iommu_ops *ops) return -ENOMEM;
nb->notifier_call = iommu_bus_notifier;

+ nb->priority = IOMMU_BUS_NOTIFY_PRIORITY;


static struct notifier_block dmar_pci_bus_add_nb = {
.notifier_call = dmar_pci_bus_notifier,
- .priority = INT_MIN,
+ .priority = IOMMU_BUS_NOTIFY_PRIORITY + 1,
};

static struct notifier_block dmar_pci_bus_remove_nb = {
.notifier_call = dmar_pci_bus_notifier,
- .priority = INT_MIN,
+ .priority = IOMMU_BUS_NOTIFY_PRIORITY - 1,
};


> Best regards,
> baolu


Thanks,

Jacob