Re: [RFC PATCH v2 21/32] iommu/vt-d: Clean the context entries of unpreserved devices

From: Baolu Lu

Date: Thu Dec 04 2025 - 01:32:58 EST


On 12/3/25 07:02, Samiullah Khawaja wrote:
During normal shutdown the iommu translation is disabled. Since the root
table is preserved during live update, it needs to be cleaned up and the
context entries of the unpreserved devices need to be cleared.

Signed-off-by: Samiullah Khawaja<skhawaja@xxxxxxxxxx>
---
drivers/iommu/intel/iommu.c | 33 ++++++++++++++++++++++++++++++--
drivers/iommu/intel/iommu.h | 1 +
drivers/iommu/intel/liveupdate.c | 1 +
3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 3f69a073b2d8..84fef81ecf4d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -16,6 +16,7 @@
#include <linux/crash_dump.h>
#include <linux/dma-direct.h>
#include <linux/dmi.h>
+#include <linux/iommu-lu.h>
#include <linux/memory.h>
#include <linux/pci.h>
#include <linux/pci-ats.h>
@@ -52,6 +53,10 @@ static int rwbf_quirk;
#define rwbf_required(iommu) (rwbf_quirk || cap_rwbf((iommu)->cap))
+#ifdef CONFIG_LIVEUPDATE
+static void __clean_unpreserved_context_entries(struct intel_iommu *iommu);
+#endif
+
/*
* set to 1 to panic kernel if can't successfully enable VT-d
* (used when kernel is launched w/ TXT)
@@ -2376,8 +2381,12 @@ void intel_iommu_shutdown(void)
/* Disable PMRs explicitly here. */
iommu_disable_protect_mem_regions(iommu);
- /* Make sure the IOMMUs are switched off */
- iommu_disable_translation(iommu);
+ if (iommu->iommu.outgoing_preserved_state) {
+ __clean_unpreserved_context_entries(iommu);
+ } else {
+ /* Make sure the IOMMUs are switched off */
+ iommu_disable_translation(iommu);
+ }
}
}
@@ -2884,6 +2893,26 @@ static const struct iommu_dirty_ops intel_second_stage_dirty_ops = {
.set_dirty_tracking = intel_iommu_set_dirty_tracking,
};
+static void __clean_unpreserved_context_entries(struct intel_iommu *iommu)
+{
+ struct device_domain_info *info;
+ struct pci_dev *pdev = NULL;
+
+ for_each_pci_dev(pdev) {
+ info = dev_iommu_priv_get(&pdev->dev);
+ if (!info)
+ continue;

I assume the per-device iommu private data is freed in the
release_device path, which runs before intel_iommu_shutdown(). If that
is the case, "info" would always be NULL here, resulting the subsequent
code dead code. Or not?

+
+ if (info->iommu != iommu)
+ continue;
+
+ if (dev_iommu_preserved_state(&pdev->dev))
+ continue;
+
+ domain_context_clear(info);
+ }
+}

Thanks,
baolu