Re: [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas

From: Gowans, James
Date: Wed Oct 09 2024 - 07:44:58 EST


On Mon, 2024-10-07 at 12:01 -0300, Jason Gunthorpe wrote:
> On Mon, Oct 07, 2024 at 08:57:07AM +0000, Gowans, James wrote:
> > With the ARM SMMUv3 for example I think there are break-before-make
> > requirement, so is it possible to do an atomic switch of the SMMUv3 page
> > table PGD in a hitless way?
>
> The BBM rules are only about cached translations. If all your IOPTEs
> result in the same translation *size* then you are safe. You can
> change the radix memory storing the IOPTEs freely, AFAIK.

Okay, but in general this still means that the page tables must have
exactly the same translations if we try to switch from one set to
another. If it is possible to change translations then translation table
entries could be created at different granularity (PTE, PMD, PUD) level
which would violate this requirement.

It's also possible for different IOMMU driver versions to set up the the
same translations, but at different page table levels. Perhaps an older
version did not coalesce come PTEs, but a newer version does coalesce.
Would the same translations but at a different size violate BBM?

If we say that to be safe/correct in the general case then it is
necessary for the translations to be *exactly* the same before and after
kexec, is there any benefit to building new translation tables and
switching to them? We may as well continue to use the exact same page
tables and construct iommufd objects (IOAS, etc) to match.

There is also a performance consideration here: when doing live update
every millisecond of down time matters. I'm not sure if this iommufd re-
initialisation will end up being in the hot path of things that need to
be done before the VM can start running again. If it in the hot path
then it would be useful to avoid rebuilding identical tables. Maybe it
ends up being in the "warm" path - the VM can start running but will
sleep if taking a page fault before IOMMUFD is re-initalised...

So overall my point is that I think we end up with a requirement that
the pgtables are identical before and after kexec so there is any
benefit in rebuilding them?

JG