Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f

From: Ethan Zhao
Date: Wed Feb 26 2025 - 00:55:44 EST



在 2025/2/26 13:18, Baolu Lu 写道:
On 2/26/25 11:50, Ethan Zhao wrote:

If the schedular doesn't run how did we get from 4 -> 5?

Maybe the issue is the shutdown handler here is running in the wrong
time and it should not be running after the scheduler has been shut
down.

I don't think removing the lock is a great idea without more
explanation.

Seems it is not so simple job to explain why there is no race window between
this iommu_shutdown() and following dmar_global_lock holders.

1. PCIe hotplug dmar_pci_bus_notifier()

2. mm_core_init detect_intel_iommu()

3. late_initcall dmar_free_unused_resources()

4. acpi attach dmar_device_hotplug()

5. pci_iommu_init intel_iommu_init() init_dmars()

6. rootfs_initcall ir_dev_scope_init()

though here is the last stage of reboot. then how about we turn back to v1

Just repalce with own_write() with down_write_trylock().

I don't think trylock is a reasonable solution. intel_iommu_shutdown()
should not become a no-op simply because it cannot acquire a lock
immediately.

No other CPUs is holding lock after they were brought down by sync call to

functionnative_stop_other_cpus(1).

So actually it wouldn't fail to acquire a lock.  this is also the reason why we don't

need to down_write() thedmar_global_lock.


The lock here is to protect the drhd (representation of iommu hardware)
list. It needs protection because this driver supports iommu hot-add and
remove, which is triggered by an ACPI event for I/O board hotplug.

Yup, the lock is used to protect the global listdmar_drhd_units.

but here all IOAPIC/LAPIC are brought down, hotplug interrupts couldn't happend either. (only legacy and NMI are alive).

Provided the system does not respond to those events when this function
is called, it's fine to remove the lock.
I agree.

Thanks,
baolu

--
"firm, enduring, strong, and long-lived"