Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?

From: Jason Gunthorpe
Date: Tue Mar 19 2024 - 13:54:32 EST


On Tue, Mar 19, 2024 at 03:47:56PM +0000, Will Deacon wrote:

> Right, it's hard to win if DMA-active devices weren't quiesced properly
> by the outgoing kernel. Either the SMMU was left in abort (leading to the
> problems you list above) or the SMMU is left in bypass (leading to possible
> data corruption). Which is better?

For whatever reason (and I really don't like this design) alot of work
was done on x86 so that device continues to work as-was right up until
the crash kernel does the first DMA operation. Including having the
crash kernel non disruptively inherit and retain the IOMMU
configuration. (eg see translation_pre_enabled() stuff in intel
driver)

I think the idea was that the crash kernel driver will recover control
of the device prior to trying to do DMA. Devices without a driver or
devices that are not operated by the crash kernel just keep going as
they were.

In general practice this is unworkable as some devices can't be
recovered without doing DMA in the first place creating a catch-22.

So now lots of devices use their shutdown handler to quiet the device
before handing over to the crash kernel.

I think this emerged as some 'small work' to try and make crash
kernels functional at all. Implementing every shutdown handler would
be pretty hard, but many (?) devices seem to work OK if the crash
kernel drivers runs for a bit before destroying their DMA setup. We
don't trigger weird platform crashes or anything due to failing DMA
operations either.

Now we have all kinds of infrastructure and deployed crash kernels
that have this assumption baked in. :( It sure would be nice to not
spread this full complexity to ARM.

If the original kernel could signal to the crash kernel that specific
devices are quieted and then the crash kernel could simply ignore
unquieted devices and set the IOMMU to abort them and don't allow any
crash drivers to attach. (or maybe FLR them?) If someone wants a
device to be usuable in the crash kernel then the original kernel
needs to implement the shutdown handler.

Regardless, I think if your goal is to support crash kernels then you
have to do at least a bit of the x86 'keep the iommu unchanged'. The
iommu shutdown should do less like x86 does and the iommu startup
should detect the special case and try to atomic switch to the new STE
table that aborts unquieted devices.

Booting a non-crash OS is a different matter and in that case you
really want every bit of HW put back to a clean "just booted" state,
and arguably it can't work unless the original kernel implements all
the shutdown handlers... I don't know if x86 kexec actually support
this, it looks like it only works on Linux OS and things like the
Linux iommu driver have code to support the crash focused hand over
even in non crash cases.

Jason