Re: [PATCH] tpm: Make timeout logic simpler and more robust

From: Jarkko Sakkinen
Date: Tue Mar 12 2019 - 08:50:39 EST


On Mon, Mar 11, 2019 at 05:27:43PM -0700, James Bottomley wrote:
> On Mon, 2019-03-11 at 16:54 -0700, Calvin Owens wrote:
> > e're having lots of problems with TPM commands timing out, and we're
> > seeing these problems across lots of different hardware (both v1/v2).
> >
> > I instrumented the driver to collect latency data, but I wasn't able
> > to find any specific timeout to fix: it seems like many of them are
> > too aggressive. So I tried replacing all the timeout logic with a
> > single universal long timeout, and found that makes our TPMs 100%
> > reliable.
> >
> > Given that this timeout logic is very complex, problematic, and
> > appears to serve no real purpose, I propose simply deleting all of
> > it.
>
> "no real purpose" is a bit strong given that all these timeouts are
> standards mandated. The purpose stated by the standards is that there
> needs to be a way of differentiating the TPM crashed from the TPM is
> taking a very long time to respond. For a normally functioning TPM it
> looks complex and unnecessary, but for a malfunctioning one it's a
> lifesaver.

Standards should be only followed when they make practical sense and
ignored when not. The range is only up to 2s anyway.

/Jarkko