On Fri, 2010-08-27 at 20:12 -0300, Cesar Eduardo Barros wrote:Em 27-08-2010 04:39, Joe Perches escreveu:On Thu, 2010-08-26 at 22:38 -0300, Cesar Eduardo Barros wrote:- The first "MCP power limit exceeded" seems very bogus.I added a logging message whenever the turbo limits change
- What do you mean, core_power_limit is zero?
and logging messages for power/temp on MCH for completeness.
Maybe this will show something useful like when/how
CPU power limit gets set to 0.
Running with it right now, did not help much:
$ dmesg | fgrep 'intel ips'
intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value
(found 25, expected 35)
intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18
intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535
intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:8058 +
mch:23392829
intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5675
intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6369
I believe all these limits should always have non-zero values.
So I still think you've hardware problems, but I suppose it
could be the driver not reading the right registers or some
such. It seems odd that the driver never printed a logging
message for either of the polling or irq methods to read the
device cpu and thermal limits.
Jesse or any Intel folk, can you verify or suggest anything
better?
If cpu_power_limit, or any _limit, is not set perhaps changing
the test style to verify limit and adding a printed_once alert
for each 0 value limit. At least that'd shut up the continuous
logging but at least give a notification message.
if (limit) {
if (measured_val> limit)
dev_info(foo)
} else
dev_alert_once()