Re: [PATCH v3 08/18] iommu/vt-d: clear unpreserved context entries during shutdown

From: Samiullah Khawaja

Date: Mon Jun 22 2026 - 18:56:58 EST


On Mon, Jun 22, 2026 at 10:47:05AM +0800, Baolu Lu wrote:
On 6/15/26 07:37, Samiullah Khawaja wrote:
During normal shutdown the iommu translation is disabled. Since the root
table is preserved during live update, it needs to be cleaned up and the
context entries of the unpreserved devices and root entries for the
unpreserved context tables need to be cleared.

Signed-off-by: Samiullah Khawaja <skhawaja@xxxxxxxxxx>
---
drivers/iommu/intel/iommu.c | 9 ++--
drivers/iommu/intel/iommu.h | 6 +++
drivers/iommu/intel/liveupdate.c | 74 ++++++++++++++++++++++++++++++++
3 files changed, 86 insertions(+), 3 deletions(-)

Please tweak the commit title to: "iommu/vt-d: Clear unpreserved..."
(capitalize "Clear") to match the driver's commit history style.

Agreed. Will do.

A high-level question: have you looked at how the suspend/resume path
behaves with the iommu and device preservation? DMA translation is
disabled and re-enabled there. I don't see any immediate changes are
needed there, but it would be good to call it out explicitly if it was
overlooked.

I looked into this during early implementation. The preserved state data
structures are not affected by the suspension of the IOMMU, as during
resume it reuses those exact same data structures in RAM. But I have not
tested this since it is unlikely that in the environments where
liveupdate is used, the suspend/resume functionality is also used.

But I plan to simply return -EBUSY from iommu_suspend() if the IOMMU is
currently preserved. This explicitly prevents the two states from
overlapping and avoids any complex edge cases. I'll add that check in
v4.


diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 715b538e7efe..26258861e3bf 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2374,8 +2374,11 @@ void intel_iommu_shutdown(void)
/* Disable PMRs explicitly here. */
iommu_disable_protect_mem_regions(iommu);
- /* Make sure the IOMMUs are switched off */
- iommu_disable_translation(iommu);
+ /* Make sure the IOMMUs are switched off if not preserved. */
+ if (iommu_preserved_state(&iommu->iommu))
+ clear_unpreserved_context_entries(iommu);

How are PCI devices handled during a live update kexec? Do they go
through the standard iommu_release_device() path?

I assume they do not, because if they did, the context entries for
preserved devices could be updated after preservation. If they do bypass
the release_device path, why not just explicitly invoke
iommu_release_device() for all devices that are not preserved?

Using iommu_release_device() for the unpreserved devices would naturally
erase their context entries and securely park those devices in a DMA
blocking state.

I considered reusing iommu_release_device(), but it leaves the hardware
in a state that is unsafe to carry across a kexec boundary when global
translation remains enabled.

- In legacy mode, iommu_release_device() does not clear the context
entries. It assumes device_block_translation() was called during a
prior domain detach to clear context entries.
- In scalable mode, while it does clear the context entries, it skips
the PASID cache invalidations (per VT-d spec 6.5.3.3 Table 25, second
row). It assumes intel_pasid_tear_down_entry() was already called
during domain detach to handle the invalidations.

During a normal shutdown, this is safe because the IOMMU driver disables
translation globally. However, since live update keeps global
translation enabled across the kexec, relying on iommu_release_device()
would lead to DMAR faults from unpreserved devices.

Manually clearing the entries and issuing global invalidations is the
safest way to guarantee the unpreserved devices are securely parked.

+ else
+ iommu_disable_translation(iommu);
}
}


[...]

Thanks,
baolu

Thanks,
Sami