Re: [PATCH] iommu/arm-smmu: Demote error messages to debug in shutdown callback

From: Robin Murphy
Date: Fri Mar 27 2020 - 15:03:03 EST


On 2020-03-27 3:09 pm, Sai Prakash Ranjan wrote:
Hi Robin,

Thanks for taking a look at this.

On 2020-03-27 19:42, Robin Murphy wrote:
On 2020-03-27 1:28 pm, Sai Prakash Ranjan wrote:
Currently on reboot/shutdown, the following messages are
displayed on the console as error messages before the
system reboots/shutdown.

On SC7180:

ÂÂ arm-smmu 15000000.iommu: removing device with active domains!
ÂÂ arm-smmu 5040000.iommu: removing device with active domains!

Demote the log level to debug since it does not offer much
help in identifying/fixing any issue as the system is anyways
going down and reduce spamming the kernel log.

I've gone back and forth on this pretty much ever since we added the
shutdown hook - on the other hand, if any devices *are* still running
in those domains at this point, then once we turn off the SMMU and let
those IOVAs go out on the bus as physical addresses, all manner of
weirdness may ensue. Thus there is an argument for *some* indication
that this may happen, although IMO it could be downgraded to at least
dev_warn().


Any pointers to the weirdness here after SMMU is turned off?
Because if we look at the call sites, device_shutdown is called
from kernel_restart_prepare or kernel_shutdown_prepare which would
mean system is going down anyways, so do we really care about these
error messages or warnings from SMMU?

Âarm_smmu_device_shutdown
 platform_drv_shutdown
ÂÂ device_shutdown
ÂÂÂ kernel_restart_prepare
ÂÂÂÂ kernel_restart

Imagine your network driver doesn't implement a .shutdown method (so the hardware is still active regardless of device links), happens to have an Rx buffer or descriptor ring DMA-mapped at an IOVA that looks like the physical address of the memory containing some part of the kernel text lower down that call stack, and the MAC receives a broadcast IP packet at about the point arm_smmu_device_shutdown() is returning. Enjoy debugging that ;)

And if coincidental memory corruption seems too far-fetched for your liking, other fun alternatives might include "display tries to scan out from powered-off device, deadlocks interconnect and prevents anything else making progress", or "access to TZC-protected physical address triggers interrupt and over-eager Secure firmware resets system before orderly poweroff has a chance to finish".

Of course the fact that in practice we'll *always* see the warning because there's no way to tear down the default DMA domains, and even if all devices *have* been nicely quiesced there's no way to tell, is certainly less than ideal. Like I say, it's not entirely clear-cut either way...

Robin.