Re: regression: 2.6.32-rc8 shuts down after reaching critical temperature

From: Thomas Renninger
Date: Wed Dec 02 2009 - 10:08:01 EST


On Wednesday 02 December 2009 14:30:32 Christoph Hellwig wrote:
> On Wed, Dec 02, 2009 at 12:56:20PM +0100, Thomas Renninger wrote:
...
> > 2.6.31 works?
>
> Yes, perfectly. Have been running it for a couple of days now again
> after I had all these reproducible .32-rc shutdowns when testiong it.
>
> > Also the latest stable one?
>
> Haven't tried that yet, will do if it helps you.
No need. Looks unrelated, the one system seem to overheat because of
no fan activity at all, yours seem to have a "passive cooling does not work
or kicks in too late" (and possibly also fan?) problem(s).

Best would be to open a bug on bugzilla.kernel.org and assign it to the
acpi component (and add Rui, Henrique and myself to CC. I won't be that
active, at least not the next days, just wanted to make sure whether
this isn't a duplicate).
dmesg, acpidump, grep . /proc/acpi/thermal_zone/*/*
and the shutdown messages should be most important info which
should show up there.

Some more hints you may want to try:

- Does cpufreq work at all?
Does this dir exist: /sys/devices/system/cpu/cpu*/cpufreq
If temp of:
watch -n1 cat /proc/acpi/thermal_zone/THM1/temperature
goes beyond 96 C
an ACPI processor event must get thrown and this:
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
will get limited (lower than ../cpufreq/cpuinfo_max_freq).
echo xy >/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
may be bad workaround.
These boot params: thermal.psv=90 thermal.tzp=10
lowering all passive trip points to 90 and enabling polling
might be a better one (with which you might be able to better
test passive cooling). This really should be a runtime sysfs
per thermal_zone parameter, but this is another story...

- Is the ACPI event thrown at all?:
SUSE has acpi_listen, not sure whether it's part of the acpid
mainline project, I think it is. Do you see an ACPI event when
96 C is past?
If not this might workaround your issue:
echo 10 >/proc/acpi/thermal_zone/THM1/polling_frequency (or similar)

- T500 sounds pretty new. Still, make sure your fans are clean.
E.g. the air must be really hot coming out at some point of time.

- Also listen a bit to the fans. with thinkpad-acpi driver you might
be able to monitor (T500 is rather new/untested) the fans:
cat /proc/acpi/ibm/fan # path out of my mind
You might also be able to alter the fan behavior there.

Good luck,

Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/