Re: 2.6.37.1 s2disk regression (TPM)

From: Stefan Berger
Date: Tue Feb 22 2011 - 06:57:57 EST


On 02/22/2011 03:41 AM, Jiri Slaby wrote:
On 02/22/2011 01:42 AM, Stefan Berger wrote:
On 02/21/2011 05:10 PM, Jiri Slaby wrote:
On 02/21/2011 11:07 PM, Rajiv Andrade wrote:
On 02/21/2011 06:44 PM, Jiri Slaby wrote:
On 02/21/2011 10:29 PM, Stefan Berger wrote:
On 02/21/2011 03:39 PM, Jiri Slaby wrote:
On 02/21/2011 06:12 PM, Rajiv Andrade wrote:
On 02/21/2011 01:34 PM, Jiri Slaby wrote:
There has to be another problem which caused my regression. And
since it
reports "Operation Timed out", the former default timeout values
worked
for me, the ones read from TPM do not.
Yes, it's highly due inconsistent timeout values reported by the
TPM as
I mentioned, my working timeouts are:
3020000 4510000 181000000
1000000 2000 150000

Actually the first one from HW is 1. This is one is HZ after
correction
in get_timeout. So perhaps it is in ms, yes.
Following the specs, the timeouts are supposed to be in
microseconds and
ascending order for short, medium and long duration. Of course, if the
device returns wrong timeouts, the command isn't going to succeed,
failing the suspend in this case. Nevertheless, I think we need the
patch I put in but at the same time we'll need a work-around for
devices
like this.
Yes, the patch is correct per se. But as it breaks bunch of machines it
cannot go in now. The rule is no regressions.

After you have the workaround it should go into the next rc1 after
that.
Do you plan to add a dmi-based quirk? Or, IOW do you want me to attach
dmidecode output? Or are you going to base it solely on TPM
manufacturer/version
It's more reliable to base the workaround on the values themselves,
instead of the TPM's ID, since
we don't know whether other models will behave similarly.
As I wrote, you may base it on dmi data.

It should be fine then to extend the existing workaround for short
timeouts to the medium and long ones.
OK, but how will you guess the values?
One way of doing it would be to at least make sure that the timeouts are

short< medium< long

and if that's not true, as in the case of your TPM, set the timeouts to
0 and have Rajiv's work-around kick in OR we assign the same high
values to the timeouts explicily that Rajiv's work-around is using right
now. Of course there could be another type of bad TPM firmware out there
where all values are in ascending order but given in ms and cause
time-outs -- but I would wait for someone to point that out since I am
not aware of such a device.
Note that it is in ascending order (1 2000 150000). As I wrote the first
timeout (1) is replaced by one HZ in get_timeouts.
The forthcoming patch will simply also adapt the other 2 values and multiply them by 1000. The reason for the suspend failure is the 2nd timeout with TPM_SaveState command being of medium duration.

There will be a 2nd patch for re-enabling the TPM's interrupts that the BIOS may (this may be BIOS-dependent) have disabled while sending a command (TPM_Startup) to the TPM upon resume and having used polling mode and leaving it with the interrupts disabled.

I'd appreciate it if you tested both of them.

Stefan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/