Re: [PATCH] Intel thermal monitor for x86_64 (updated)

From: Andi Kleen
Date: Thu Nov 18 2004 - 16:20:50 EST


On Thu, Nov 18, 2004 at 10:58:46AM -0700, Zwane Mwaikambo wrote:
> > > + static unsigned long next_check[NR_CPUS];
> >
> > Use per cpu data for this.
>
> I can do that, doesn't per_cpu do cacheline alignment? I was trying to
> avoid allocating more space.

Not for individual data items, no. It would be completely useless because
only the same CPUs are sharing data there.

> > > + if (!cpu_isset(m.cpu, logged_cpus)) {
> > > + cpu_set(m.cpu, logged_cpus);
> > > + mce_log(&m);
> > > + }
> >
> > Does the comment match the code? I guess you mean the following
> > events after the first can be bogus.
> >
> > It seems a bit bogus to printk but not log for known spurious
> > conditions.
>
> Well by spurious i mean that there will be a _lot_ of these events whilst
> the TCC attempts to reduce the processor temperature, i don't want to
> flood the mce log with the subsequent events.


Flooding the kmsg is as bad.

I think a better strategy is to just increase the minimum check interval
to avoid this. And then treat printk and mce_log the same.

>
> > Also the next_check logic should already handle this I guess,
> > becaumse I assume the temperature dropping won't take
> > that long. So I guess it would be best to drop that
> > and if it's still a problem use a longer next_check timeout
> > of several seconds.
>
> The temperature drop can take a while, i've observed 2-3 minutes if the
> processor is also loaded and the ambient temperature is low (20C). So you
> could lose 12 or so slots in the mce log due to the temperature ping
> ponging.

Ok then perhaps a extremly long check timeout of 5 minutes?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/