Re: [REGRESSION] suspend to ram fails in 6.2-rc1 due to tpm errors
From: Jarkko Sakkinen
Date: Tue Mar 14 2023 - 10:24:39 EST
On Tue, Mar 14, 2023 at 02:53:11PM +0100, Jason A. Donenfeld wrote:
> On Tue, Mar 14, 2023 at 03:08:21PM +0200, Jarkko Sakkinen wrote:
> > On Tue, Mar 14, 2023 at 03:06:47PM +0200, Jarkko Sakkinen wrote:
> > > On Tue, Mar 14, 2023 at 01:47:38PM +0100, Jason A. Donenfeld wrote:
> > > > On 3/14/23, Jarkko Sakkinen <jarkko@xxxxxxxxxx> wrote:
> > > > > On Tue, Mar 14, 2023 at 10:35:33AM +0100, Thorsten Leemhuis wrote:
> > > > >> On 09.01.23 17:08, Jason A. Donenfeld wrote:
> > > > >> > On Thu, Jan 05, 2023 at 02:59:15PM +0100, Thorsten Leemhuis wrote:
> > > > >> >> On 29.12.22 05:03, Jason A. Donenfeld wrote:
> > > > >> >>> On Wed, Dec 28, 2022 at 06:07:25PM -0500, James Bottomley wrote:
> > > > >> >>>> On Wed, 2022-12-28 at 21:22 +0100, Vlastimil Babka wrote:
> > > > >> >>>>> Ugh, while the problem [1] was fixed in 6.1, it's now happening
> > > > >> >>>>> again
> > > > >> >>>>> on the T460 with 6.2-rc1. Except I didn't see any oops message or
> > > > >> >>>>> "tpm_try_transmit" error this time. The first indication of a
> > > > >> >>>>> problem
> > > > >> >>>>> is this during a resume from suspend to ram:
> > > > >> >>>>> tpm tpm0: A TPM error (28) occurred continue selftest
> > > > >> >>>>> and then periodically
> > > > >> >>>>> tpm tpm0: A TPM error (28) occurred attempting get random
> > > > >> >>>>
> > > > >> >>>> That's a TPM 1.2 error which means the TPM failed the selftest. The
> > > > >> >>>> original problem was reported against TPM 2.0 because of a missing
> > > > >> >>>> try_get_ops().
> > > > >> >>>
> > > > >> >>> No, I'm pretty sure the original bug, which was fixed by "char: tpm:
> > > > >> >>> Protect tpm_pm_suspend with locks" regards 1.2 as well, especially
> > > > >> >>> considering it's the same hardware from Vlastimil causing this. I
> > > > >> >>> also
> > > > >> >>> recall seeing this in 1.2 when I ran this with the TPM emulator. So
> > > > >> >>> that's not correct.
> > > > >> > [...]
> > > > >> > So, this is now in rc3:
> > > > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1382999aa0548a171a272ca817f6c38e797c458c
> > > > >> >
> > > > >> > That should help avoid the worst of the issue -- laptop not sleeping.
> > > > >> > But the race or whatever it is still does exist. So you might want to
> > > > >> > keep this in your tracker to periodically nudge the TPM folks about it.
> > > > >>
> > > > >> I did, and with -rc2 out now is a good time to remind everybody about
> > > > >> it. Jarkko even looked into it, but no real fix emerged afaics. Or did
> > > > >> it?
> > > > >
> > > > > Jason's workaround was picked. I asked some questions in the thread but
> > > > > have not received any responses.
> > > >
> > > > As I've written several times now, that patch doesn't fix the issue.
> > > > It makes it less common but it still exists and needs to be addressed.
> > > > Please re-read my various messages describing this. I have nothing new
> > > > at all to add; you just need to review my prior comments. There's a
> > > > bug that probably needs to be fixed here by somebody who understands
> > > > the tpm1 code.
> > >
> > > I'll try qemu path to see if I can reproduce it with/without the already
> > > merged workaround.
> >
> > BTW, what sort of environment you had for your qemu run? I'm creating a
> > simple initramfs with buildroot for this.
>
> Nothing special at all in the userspace.
>
> I think details of my test bed might be in some other thread from when
> that original patch went in or when the original bug report came, but
> from memory, I believe what I did to reliably reproduce various issues
> was comment out the sleep in random.c so that it keeps asking the TPM
> for more bytes from the kthread, like this:
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index ce3ccd172cc8..708110c780aa 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -934,20 +934,20 @@ EXPORT_SYMBOL(add_device_randomness);
> void add_hwgenerator_randomness(const void *buf, size_t len, size_t entropy, bool sleep_after)
> {
> mix_pool_bytes(buf, len);
> credit_init_bits(entropy);
>
> /*
> * Throttle writing to once every reseed interval, unless we're not yet
> * initialized or no entropy is credited.
> */
> - if (sleep_after && !kthread_should_stop() && (crng_ready() || !entropy))
> - schedule_timeout_interruptible(crng_reseed_interval());
> +// if (sleep_after && !kthread_should_stop() && (crng_ready() || !entropy))
> +// schedule_timeout_interruptible(crng_reseed_interval());
> }
> EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
>
> /*
> * Handle random seed passed by bootloader, and credit it depending
> * on the command line option 'random.trust_bootloader'.
> */
> void __init add_bootloader_randomness(const void *buf, size_t len)
> {
>
> Then I hooked the tpm emulator up to qemu and put it in tpm1 mode. I had
> userspace `echo mem > /sys/power/state` every couple of seconds (or
> continuously maybe?), and then I used the qemu monitor interface to wake
> the system from sleep. And kaboom.
Thanks for the tips! I'll try various patch combinations and see what
happens.
BR, Jarkko