Re: [PATCH] Intel thermal monitor for x86_64

From: Andi Kleen
Date: Sun Nov 14 2004 - 03:18:20 EST


On Sat, Nov 13, 2004 at 11:36:47AM -0700, Zwane Mwaikambo wrote:
> Patch adds support for notification of overheating conditions on intel
> x86_64 processors. Tested on EM64T, test booted on AMD64.
>
> Hardware courtesy of Intel Corporation

Did you actually execute the code by faking/forcing such a event?

>
> +#if defined(CONFIG_X86_MCE_INTEL)
> +ENTRY(thermal_interrupt)
> + apicinterrupt THERMAL_APIC_VECTOR,smp_thermal_interrupt
> +#endif

Cleaner would be probably to add a weak dummy smp_thermal_interrupt
in traps.c and drop all the ifdefs.

> +
> +asmlinkage void smp_thermal_interrupt(void)
> +{
> + u64 status;
> +
> + ack_APIC_irq();
> +
> + irq_enter();
> + rdmsrl(MSR_IA32_THERM_STATUS, status);
> + if (status & 0x1) {
> + cpu_set(smp_processor_id(), cpu_thermal_status);
> + add_taint(TAINT_MACHINE_CHECK);

Maybe this should be made a different taint bit?

> + } else {
> + cpu_clear(smp_processor_id(), cpu_thermal_status);
> + }
> +
> + if (time_after(jiffies, next_thermal_check))
> + tasklet_schedule(&thermal_tasklet);

I think there is actually a better way to do this (sorry for telling
you late, but I also only realized it later). Can you just make
the thermal APIC interrupt non NMI? Then the normal locking rules
apply and printk should work directly.

Also can you at least additionally log an synthetic event using mce_log() ?
This way someone collecting these log entries centrally get its it
all in the same log file.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/