Re: [PATCH] tpm: Make timeout logic simpler and more robust

From: James Bottomley
Date: Mon Mar 11 2019 - 20:27:49 EST


On Mon, 2019-03-11 at 16:54 -0700, Calvin Owens wrote:
> e're having lots of problems with TPM commands timing out, and we're
> seeing these problems across lots of different hardware (both v1/v2).
>
> I instrumented the driver to collect latency data, but I wasn't able
> to find any specific timeout to fix: it seems like many of them are
> too aggressive. So I tried replacing all the timeout logic with a
> single universal long timeout, and found that makes our TPMs 100%
> reliable.
>
> Given that this timeout logic is very complex, problematic, and
> appears to serve no real purpose, I propose simply deleting all of
> it.

"no real purpose" is a bit strong given that all these timeouts are
standards mandated. The purpose stated by the standards is that there
needs to be a way of differentiating the TPM crashed from the TPM is
taking a very long time to respond. For a normally functioning TPM it
looks complex and unnecessary, but for a malfunctioning one it's a
lifesaver.

Could you first check it's not a problem we introduced with our polling
changes? My nuvoton still doesn't work properly with the default poll
timings but it works flawlessly if I use the patch below. I think my
nuvoton is a bit out of spec (it's a very early model that was software
upgraded from 1.2 to 2.0) because no-one else on the list seems to see
the problems I see, but perhaps you are.

James

---