Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

From: Srinivas Pandruvada
Date: Fri Oct 18 2019 - 11:55:19 EST


On Fri, 2019-10-18 at 15:23 +0200, Borislav Petkov wrote:
> On Fri, Oct 18, 2019 at 05:26:36AM -0700, Srinivas Pandruvada wrote:
> > Server/desktops generally rely on the embedded controller for FAN
> > control, which kernel have no control. For them this warning helps
> > to
> > either bring in additional cooling or fix existing cooling.
>
> How exactly does this warning help? A detailed example please.
I assume that someone is having performance issues or occasion reboots,
look at the logs. Is it a fair assumption? If not, logging has no
value.

In the current code, this logging is misleading. It is reporting all
normal throttling at PROCHOT.

But if a system is running at up to 87.5% of duty cycle on top of
lowest possible frequency of around 800MHz, someone will notice.
If logs are not the starting point, someone has to run tools like
turbostat and understand the cause of performance issues. Then probably
someone cleanup air vents on dusty desktop sitting under the desk.

Anyway, we can provide better document for the sysfs counters this code
is dumping and how to interpret them with or without logging support. I
can add some document under kernel documentation.

Thanks,
Srinivas




>
> > If something needs to force throttle from kernel, then we should
> > use
> > some offset from the max temperature (aka TJMax), instead of this
> > warning threshold. Then we can use idle injection or change duty
> > cycle
> > of CPU clocks.
>
> Yes, as I said, all this needs to be properly defined first. That is,
> *if* there's even need for reacting to thermal interrupts in the
> kernel.
>