Re: [PATCH v2 07/16] iommu/vt-d: Implement device and iommu preserve/unpreserve ops

From: Samiullah Khawaja

Date: Mon May 18 2026 - 16:39:04 EST


On Fri, May 08, 2026 at 02:36:56AM +0000, Samiullah Khawaja wrote:
On Thu, May 07, 2026 at 02:25:14PM +0800, Baolu Lu wrote:
On 4/28/26 01:56, Samiullah Khawaja wrote:
Add implementation of the device and iommu presevation in a separate
file. Also set the device and iommu preserve/unpreserve ops in the
struct iommu_ops.

During normal shutdown the iommu translation is disabled. Since the root
table is preserved during live update, it needs to be cleaned up and the
context entries of the unpreserved devices need to be cleared.

This is not related to preserve/unpreserve ops and could be made in a
separated patch?

Agreed. I will move this stuff to a separate patch.


Signed-off-by: Samiullah Khawaja <skhawaja@xxxxxxxxxx>
---
MAINTAINERS | 1 +
drivers/iommu/intel/Makefile | 1 +
drivers/iommu/intel/iommu.c | 52 +++++++++++-
drivers/iommu/intel/iommu.h | 28 +++++++
drivers/iommu/intel/liveupdate.c | 139 +++++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 18 ++++
include/linux/iommu-liveupdate.h | 10 +++
include/linux/iommu.h | 14 ++++
include/linux/kho/abi/iommu.h | 18 ++++
9 files changed, 277 insertions(+), 4 deletions(-)
create mode 100644 drivers/iommu/intel/liveupdate.c


[snip]

+{
+ struct context_entry *context;
+ int ret;
+ int i;
+
+ for (i = 0; i < ROOT_ENTRY_NR; i++) {
+ /*
+ * Alloc the context tables now to make sure the iommu unit is
+ * properly preserved. These might stay unused and wastes around
+ * 32MB max in scalable mode.
+ */

Instead of allocating and preserving context tables for all root entries
(as noted, can waste up to 32MB), could we restrict this only to the
entries possibly in use by active PCI devices?

I think the hotplug devices or VFs created through SR-IOV will be missed
that way. Lets say device A is preserved and the associated iommu is
also preserved. And then a new device B is hotplugged and preserved,
then the context table for that will be missed.

Ok I thought about it a little more and basically we have following
things to consider when we preserve context tables,

- The devices can be hotplugged and preserved, so the context tables of
those need to be preserved if we don't allocate all of them first time
we preserve iommu, as done here.
- New context tables can be added (after hotplug) for unpreserved
devices. And if we don't get another iommu preserve call after these
are added, those remain unpreserved, so during shutdown those entries
need to be removed from root table or preserved for simplicity.

To solve this we can,

1. Either preserve the new context table when it is added for a preserved
iommu. This can be done in iommu_context_addr(). This is simpler and
no tracking needed.

2. Or track the preserved context tables using a bitmap and then preserve
them incremently whenever a device is preserved. On shutdown during
cleanup, we can clear the entries for unpreserved context tables from
root table.

I am inclined towards second option. WDYT?

I think we will have to do similar stuff for PASID also down the road to
preserve pasid_tables in PASID directory.

Since we don't track the context_tables that are preserved, there is no
way to incremently preserve the new-ones. Let me look into the behaviour
of KHO, maybe we can make the preserve call idempotent and do these
incrementally.

+ spin_lock(&iommu->lock);
+ context = iommu_context_addr(iommu, i, 0, 1);
+ spin_unlock(&iommu->lock);
+ if (!context) {
+ ret = -ENOMEM;
+ goto error;
+ }

[snip]

Thanks,
baolu


Thanks,
Sami

Sami