Re: [PATCH v2 1/6] hwrng: core: Freeze khwrng thread during suspend

From: Alexander Steffen
Date: Fri Aug 16 2019 - 11:56:23 EST

On 03.08.2019 00:50, Stephen Boyd wrote:
Quoting Stephen Boyd (2019-07-17 10:03:22)
Quoting Jason Gunthorpe (2019-07-17 09:50:11)
On Wed, Jul 17, 2019 at 09:42:32AM -0700, Stephen Boyd wrote:

Yes. That's exactly my point. A hwrng that's suspended will fail here
and it's better to just not try until it's guaranteed to have resumed.

It just seems weird to do this, what about all the other tpm API
users? Do they have a racy problem with suspend too?

I haven't looked at them. Are they being called from suspend/resume
paths? I don't think anything for the userspace API can be a problem
because those tasks are all frozen. The only problem would be some
kernel internal API that TPM API exposes. I did a quick grep and I see
things like IMA or the trusted keys APIs that might need a closer look.

Either way, trying to hold off a TPM operation from the TPM API when
we're suspended isn't really possible. If something like IMA needs to
get TPM data from deep suspend path and it fails because the device is
suspended, all we can do is return an error from TPM APIs and hope the
caller can recover. The fix is probably going to be to change the code
to not call into the TPM API until the hardware has resumed by avoiding
doing anything with the TPM until resume is over. So we're at best able
to make same sort of band-aid here in the TPM API where all we can do is
say -EAGAIN but we can't tell the caller when to try again.

Andrey talked to me a little about this today. Andrey would prefer we
don't just let the TPM go into a wonky state if it's used during
suspend/resume so that it can stay resilient to errors. Sounds OK to me,
but my point still stands that we need to fix the callers.

I'll resurrect the IS_SUSPENDED flag and make it set generically by the
tpm_pm_suspend() and tpm_pm_resume() functions and then spit out a big
WARN_ON() and return an error value like -EAGAIN if the TPM functions
are called when the TPM is suspended. I hope we don't hit the warning
message, but if we do then at least we can track it down rather quickly
and figure out how to fix the caller instead of just silently returning
-EAGAIN and hoping for that to be visible to the user.

There is another use case I see for this functionality: There are ways for user space to upgrade the TPM's firmware via /dev/tpm0 (using e.g. TPM2_FieldUpgradeStart/TPM2_FieldUpgradeData). While upgrading, the normal TPM functionality might not be available (commands return TPM_RC_UPGRADE or other error codes). Even after the upgrade is finished, the TPM might continue to refuse command execution (e.g. with TPM_RC_REBOOT).

I'm not sure whether all in-kernel users are prepared to deal correctly with those error codes. But even if they are, it might be better to block them from sending commands in the first place, to not interfere with the upgrade process.

What would you think about a way for a user space upgrade tool to also set this flag, to make the TPM unavailable for everything but the upgrade process?