Hi All,
On Fri, Feb 28, 2025 at 8:51 AM Ethan Zhao <etzhao1900@xxxxxxxxx> wrote:
patch v1:
On 2/28/2025 4:38 AM, Jason Gunthorpe wrote:
On Thu, Feb 27, 2025 at 08:40:31AM +0800, Ethan Zhao wrote:The -f option to reboot command, suggests data corruption possibility.although
On 2/26/2025 9:04 PM, Jason Gunthorpe wrote:There is no guarentee device dma is halted anyhow at this point either.
On Wed, Feb 26, 2025 at 01:55:28PM +0800, Ethan Zhao wrote:Hmm... so far it is fine, the iommu_shutdown only has a little work to
I think it is running the destruction of the iommu far too late in theProvided the system does not respond to those events when this functionI agree.
is called, it's fine to remove the lock.
process. IMHO it should be done after all the drivers have been
shutdown, before the CPUs go single threaded.
do, disable the translation, the PMR disabling is just backward compatible,
was deprecated already. if we move it to one position where all CPUs are
cycling, we don't know what kind of user-land tasks left there (i.e. reboot -f
case), it would be hard to full-fill the requirement of Intel VT-d, no ongoing
transaction there on hardware when issue the translation disabling command.
the IOMMU strives not to cross the transaction boundaries of its address
translation.
over all, we should make the reboot -f function works. not to hang
there. meanwhile
it doesn't break anything else so far.
if (!down_write_trylock(&dmar_global_lock))
return;
patch v3:
/* All other CPUs were brought down, hotplug interrupts were disabled,
no lock and RCU checking needed anymore*/
list_for_each_entry(drhd, &dmar_drhd_units, list) {
iommu = drhd->iommu;
/* Disable PMRs explicitly here. */
iommu_disable_protect_mem_regions(iommu);
iommu_disable_translation(iommu);
}
Since we can remove down_write/up_write, it indicates that during the
IOMMU shutdown process, we are not particularly concerned about
whether others are accessing the critical section. Therefore, it can
be understood that Patch v1 can successfully acquire the lock and
proceed with subsequent operations. From this perspective, Patch v1 is
reasonable and can prevent situations where there is an actual owner
of dmar_global_lock, even though it does not perform a real IOMMU
shutdown.
However, if the IOMMU shutdown truly fails, IOMMU_WAIT_OP will trigger
a panic(). Removing down_write/up_write offers better maintainability
than Patch v1 (as we can retrieve the cause from the vmcore). But this
might not be significant, since the reboot could have been completed
successfully, and the occurrence of panic() might instead cause
confusion.
In summary, it is essential to complete the reboot action here rather
than causing the system to hang. So, which do you prefer, v1 or v3?
Thanks,Thanks,
Ethan
Jason
Yunhui