Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded"messages

From: Cesar Eduardo Barros
Date: Sat Aug 28 2010 - 15:07:29 EST


Em 28-08-2010 12:23, Henrique de Moraes Holschuh escreveu:
On Sat, 28 Aug 2010, Cesar Eduardo Barros wrote:
The solution here probably is not less logging. The best solution
IMO would be to do some sanity checking when loading the module, and
if the values do not make sense, print something to the log and
return -ENODEV.

As long as your sanity checking won't make the module fail to load in the
following scenario:

1. environment temperature control fails, room starts to heat up
2. things go south, server reboots due to exceeded temperature limits
3. OS boots in an overheat situation
4. module refuse to load because it expects to never start in a overheating
situation.

If the sanity checks will cause (4), then don't add them. rate-limit the
thermal alarms (issue them only once every T, and only if temperature has
increased more than, say, 5ÂC from the last alarm).

I have not read the datasheet (I do not even know if it is available to the public; I have not looked), but I would not expect to see a power limit of 0 even if the CPU is on fire. Of course, you have to be more cautious when validating the current temperature (and even then, if it says the CPU is encased in a block of ice, something odd is going on).

If a given platform is buggy crap (or just el-cheapo trash that overheats
all the time) to the point that the module is useless, blacklist it by DMI
and inform the user.

I expect that, when it works as it should, the first read while
loading the module already returns sane values, so a sanity check

well, as long as "sane" does include server-is-too-hot situations...

Of course. (But you most probably will want to s/server/laptop/ here.)

there should not have many false positives. OTOH, it is best to not
load the module when you think things are strange.

What good is an alarm module that refuses to load when there is an alarm
condition happening already?

This is not an alarm module; AFAIK it is a module for the feature in recent Intel CPU/GPU chips which allow you to overclock it a bit as long as the thermal and power limit has not been exceeded:

config INTEL_IPS
tristate "Intel Intelligent Power Sharing"
depends on ACPI
---help---
Intel Calpella platforms support dynamic power sharing between the
CPU and GPU, maximizing performance in a given TDP. This driver,
along with the CPU frequency and i915 drivers, provides that
functionality. If in doubt, say Y here; it will only load on
supported platforms.

If the module is not loaded, it simply will not be able to go above its nominal clock, so refusing to load it is not that much of a problem.

--
Cesar Eduardo Barros
cesarb@xxxxxxxxxx
cesar.barros@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/