Re: [REGRESSION] suspend to ram fails in 6.2-rc1 due to tpm errors

From: Jason A. Donenfeld
Date: Tue Mar 14 2023 - 09:00:10 EST


On 3/14/23, Jarkko Sakkinen <jarkko@xxxxxxxxxx> wrote:
> On Tue, Mar 14, 2023 at 10:35:33AM +0100, Thorsten Leemhuis wrote:
>> On 09.01.23 17:08, Jason A. Donenfeld wrote:
>> > On Thu, Jan 05, 2023 at 02:59:15PM +0100, Thorsten Leemhuis wrote:
>> >> On 29.12.22 05:03, Jason A. Donenfeld wrote:
>> >>> On Wed, Dec 28, 2022 at 06:07:25PM -0500, James Bottomley wrote:
>> >>>> On Wed, 2022-12-28 at 21:22 +0100, Vlastimil Babka wrote:
>> >>>>> Ugh, while the problem [1] was fixed in 6.1, it's now happening
>> >>>>> again
>> >>>>> on the T460 with 6.2-rc1. Except I didn't see any oops message or
>> >>>>> "tpm_try_transmit" error this time. The first indication of a
>> >>>>> problem
>> >>>>> is this during a resume from suspend to ram:
>> >>>>> tpm tpm0: A TPM error (28) occurred continue selftest
>> >>>>> and then periodically
>> >>>>> tpm tpm0: A TPM error (28) occurred attempting get random
>> >>>>
>> >>>> That's a TPM 1.2 error which means the TPM failed the selftest. The
>> >>>> original problem was reported against TPM 2.0 because of a missing
>> >>>> try_get_ops().
>> >>>
>> >>> No, I'm pretty sure the original bug, which was fixed by "char: tpm:
>> >>> Protect tpm_pm_suspend with locks" regards 1.2 as well, especially
>> >>> considering it's the same hardware from Vlastimil causing this. I
>> >>> also
>> >>> recall seeing this in 1.2 when I ran this with the TPM emulator. So
>> >>> that's not correct.
>> > [...]
>> > So, this is now in rc3:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1382999aa0548a171a272ca817f6c38e797c458c
>> >
>> > That should help avoid the worst of the issue -- laptop not sleeping.
>> > But the race or whatever it is still does exist. So you might want to
>> > keep this in your tracker to periodically nudge the TPM folks about it.
>>
>> I did, and with -rc2 out now is a good time to remind everybody about
>> it. Jarkko even looked into it, but no real fix emerged afaics. Or did
>> it?
>
> Jason's workaround was picked. I asked some questions in the thread but
> have not received any responses.

As I've written several times now, that patch doesn't fix the issue.
It makes it less common but it still exists and needs to be addressed.
Please re-read my various messages describing this. I have nothing new
at all to add; you just need to review my prior comments. There's a
bug that probably needs to be fixed here by somebody who understands
the tpm1 code.