Re: [PATCH v2 07/16] iommu/vt-d: Implement device and iommu preserve/unpreserve ops
From: Pranjal Shrivastava
Date: Tue May 19 2026 - 10:45:50 EST
On Mon, May 18, 2026 at 08:32:42PM +0000, Samiullah Khawaja wrote:
> On Fri, May 08, 2026 at 02:36:56AM +0000, Samiullah Khawaja wrote:
> > On Thu, May 07, 2026 at 02:25:14PM +0800, Baolu Lu wrote:
> > > On 4/28/26 01:56, Samiullah Khawaja wrote:
> > > > Add implementation of the device and iommu presevation in a separate
> > > > file. Also set the device and iommu preserve/unpreserve ops in the
> > > > struct iommu_ops.
> > > >
> > > > During normal shutdown the iommu translation is disabled. Since the root
> > > > table is preserved during live update, it needs to be cleaned up and the
> > > > context entries of the unpreserved devices need to be cleared.
> > >
> > > This is not related to preserve/unpreserve ops and could be made in a
> > > separated patch?
> >
> > Agreed. I will move this stuff to a separate patch.
> > >
> > > >
> > > > Signed-off-by: Samiullah Khawaja <skhawaja@xxxxxxxxxx>
> > > > ---
> > > > MAINTAINERS | 1 +
> > > > drivers/iommu/intel/Makefile | 1 +
> > > > drivers/iommu/intel/iommu.c | 52 +++++++++++-
> > > > drivers/iommu/intel/iommu.h | 28 +++++++
> > > > drivers/iommu/intel/liveupdate.c | 139 +++++++++++++++++++++++++++++++
> > > > drivers/iommu/iommu.c | 18 ++++
> > > > include/linux/iommu-liveupdate.h | 10 +++
> > > > include/linux/iommu.h | 14 ++++
> > > > include/linux/kho/abi/iommu.h | 18 ++++
> > > > 9 files changed, 277 insertions(+), 4 deletions(-)
> > > > create mode 100644 drivers/iommu/intel/liveupdate.c
> > > >
>
> [snip]
> > >
> > > > +{
> > > > + struct context_entry *context;
> > > > + int ret;
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < ROOT_ENTRY_NR; i++) {
> > > > + /*
> > > > + * Alloc the context tables now to make sure the iommu unit is
> > > > + * properly preserved. These might stay unused and wastes around
> > > > + * 32MB max in scalable mode.
> > > > + */
> > >
> > > Instead of allocating and preserving context tables for all root entries
> > > (as noted, can waste up to 32MB), could we restrict this only to the
> > > entries possibly in use by active PCI devices?
> >
> > I think the hotplug devices or VFs created through SR-IOV will be missed
> > that way. Lets say device A is preserved and the associated iommu is
> > also preserved. And then a new device B is hotplugged and preserved,
> > then the context table for that will be missed.
>
> Ok I thought about it a little more and basically we have following
> things to consider when we preserve context tables,
>
> - The devices can be hotplugged and preserved, so the context tables of
> those need to be preserved if we don't allocate all of them first time
> we preserve iommu, as done here.
> - New context tables can be added (after hotplug) for unpreserved
> devices. And if we don't get another iommu preserve call after these
> are added, those remain unpreserved, so during shutdown those entries
> need to be removed from root table or preserved for simplicity.
>
> To solve this we can,
>
> 1. Either preserve the new context table when it is added for a preserved
> iommu. This can be done in iommu_context_addr(). This is simpler and
> no tracking needed.
>
> 2. Or track the preserved context tables using a bitmap and then preserve
> them incremently whenever a device is preserved. On shutdown during
> cleanup, we can clear the entries for unpreserved context tables from
> root table.
>
> I am inclined towards second option. WDYT?
Thinking out loud here, I agree that shifting away from the 32MB
pre-allocation is the right direction. I'm wondering if we can avoid the
overhead of introducing a new tracking bitmap (Option 2) altogether?
Since the IOMMU serialization is a strict dependency for device tracking,
could we move the context table preservation directly into the device
level op: intel_iommu_preserve_device()?
Whenever a specific device is preserved on-demand:
1. It queries the parent IOMMU to fetch the allocated context table
backing its info->bus.
2. It calls iommu_preserve_page(context) for that table. Because KHO's
tracking handles duplicates, this should be fine if multiple devices
reside on the same bus...
Regarding Scalable Mode, we could just need a simple check in that path:
/* intel_iommu_preserve_device */
/* Preserve the primary/lower context table backing this bus */
context = iommu_context_addr(info->iommu, info->bus, 0, 0);
if (context)
iommu_preserve_page(context);
/* If scalable mode is active, preserve the upper context table as well */
if (sm_supported(info->iommu)) {
context = iommu_context_addr(info->iommu, info->bus, 0x80, 0);
if (context)
iommu_preserve_page(context);
}
WDYT?
>
> I think we will have to do similar stuff for PASID also down the road to
> preserve pasid_tables in PASID directory.
> >
> > Since we don't track the context_tables that are preserved, there is no
> > way to incremently preserve the new-ones. Let me look into the behaviour
> > of KHO, maybe we can make the preserve call idempotent and do these
> > incrementally.
> > >
> > > > + spin_lock(&iommu->lock);
> > > > + context = iommu_context_addr(iommu, i, 0, 1);
> > > > + spin_unlock(&iommu->lock);
> > > > + if (!context) {
> > > > + ret = -ENOMEM;
> > > > + goto error;
> > > > + }
>
[snip]
Thanks,
Praan