Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

From: Borislav Petkov
Date: Tue Oct 15 2019 - 04:46:21 EST


On Mon, Oct 14, 2019 at 03:41:38PM -0700, Srinivas Pandruvada wrote:
> So some users who had issues in their systems can try with this patch.
> We can get rid of this, till it becomes real issue.

We don't add command line parameters which we maybe can get rid of
later.

> The temperature is function of load, time and heat dissipation capacity
> of the system. I have to think more about this to come up with some
> heuristics where we still warning users about real thermal issues.
> Since value is not persistent, then next boot again will start from the
> default.

Yes, and the fact that each machine's temperature is influenced by the
specific *individual* environment and load the machine runs, shows that
you need to adjust this timeout automatically and dynamically.

With the command line parameter you're basically putting the onus on the
user to do that which is just silly. And then she'd need to do it during
runtime too, if the ambient temperature or machine load, etc, changes.

The whole thing is crying "dynamic".

For a simple example, see mce_timer_fn() where we switch to polling
during CMCI storms.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette