Re: [PATCH] tpm: Disable hwrng for TPM 1 if PM_SLEEP is enabled
From: Jason A. Donenfeld
Date: Thu Jan 05 2023 - 09:53:57 EST
On Thu, Jan 05, 2023 at 03:47:42PM +0100, Jason A. Donenfeld wrote:
> TPM 1's support for its hardware RNG is broken across system suspends,
> due to races or locking issues or something else that haven't been
> diagnosed or fixed yet. These issues prevent the system from actually
> suspending. So disable the driver in this case. Later, when this is
> fixed properly, we can remove this.
>
> Current breakage amounts to something like:
>
> tpm tpm0: A TPM error (28) occurred continue selftest
> ...
> tpm tpm0: A TPM error (28) occurred attempting get random
> ...
> tpm tpm0: Error (28) sending savestate before suspend
> tpm_tis 00:08: PM: __pnp_bus_suspend(): tpm_pm_suspend+0x0/0x80 returns 28
> tpm_tis 00:08: PM: dpm_run_callback(): pnp_bus_suspend+0x0/0x10 returns 28
> tpm_tis 00:08: PM: failed to suspend: error 28
> PM: Some devices failed to suspend, or early wake event detected
>
> This issue was partially fixed by 23393c646142 ("char: tpm: Protect
> tpm_pm_suspend with locks"), in a last minute 6.1 commit that Linus took
> directly because the TPM maintainers weren't available. However, it
> seems like this just addresses the most common cases of the bug, rather
> than addressing it entirely. So there are more things to fix still,
> apparently.
>
> The hwrng driver appears already to be occasionally disabled due to
> other conditions, so this shouldn't be too large of a surprise.
>
> Link: https://lore.kernel.org/lkml/7cbe96cf-e0b5-ba63-d1b4-f63d2e826efa@xxxxxxx/
> Cc: stable@xxxxxxxxxxxxxxx # 6.1+
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
Quoting from my previous email:
| I spent a long time working through the TPM code when this came up
| during 6.1. I set up the TPM emulator with QEMU and reproduced this and
| had a whole test setup and S3 fuzzer. It took a long time, and when I was
| done, I paged it all out of my brain. When I found that patch from Jan
| that fixed the problem most of the time (but not all the time), I wasted
| tons of time trying to get the TPM maintainers to take the patch and
| send it to Linus as part of rc7 or rc8. But they all ignored me, and
| eventually Linus just took that patch directly.
|
| So I don't think I want to go down another rabbit hole here, having
| experienced the TPM maintainers not really caring much, and that sucking
| away the remaining energy I had before to keep looking at the issue and
| its edge cases not handled by Jan's patch.
|
| On the contrary, it'd make a big difference if the TPM maintainers could
| actually help analyze the code that they're most familiar with, so that
| we can get to the bottom of this. That's a lot better than some random
| drive-by patches from a non-TPM person like me; before the 6.1 bug, I'd
| never even looked at these drivers.
|
| My plan is to therefore be available to help and analyze and test and
| maybe even write some code, if the TPM maintainers take the lead on
| getting to the bottom of this. But if this hits neglect again like last
| time, I'll just send a `depends on BROKEN if PM` patch to the TPM
| hw_random driver and see what happens... That's a really awful solution
| though, so I hope the maintainers will wake up this cycle.
Seeing as there's still no life from the TPM maintainers, here's the
patch to make the problem go away until they wake up. When they do wake
up, though, I will be available to start looking into this again in
whatever capacity I might be useful.
Jason