Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

From: Borislav Petkov
Date: Tue Oct 15 2019 - 04:37:10 EST

On Mon, Oct 14, 2019 at 03:27:35PM -0700, Luck, Tony wrote:
> You need a plausible start point for the "when to worry the user"
> message. Maybe that is your "max value"?

Yes, that would be a good start.

You need that anyway because the experimentations you guys did to get
your numbers have been done in some ambient temperature of X. I betcha
when the ambient temperature differs considerably from yours, the
numbers don't mean a whole lot.

Which makes a dynamic adjustment even more important.

> So if the system has a couple of excursions above temperature lasting
> 1 second and then 2 seconds ... would you like to see those ignored
> (because they are below the initial max)? But now we have a couple
> of data points pick some new value to be the threshold for reporting?
> What value should we pick (based on 1 sec, then 2 sec)?
> I would be worried that it would self tune to the point where it
> does report something that it really didn't need to (e.g. as a result
> of a few consecutive very short excursions).

You select a history feedback formula with which sudden changes
influence the timeout value relatively slowly and keep the current
timeout value rather inert. They would take effect only when such spikes
hold on for a longer time, i.e., take up a longer chunk of the sampling

> We also need to take into account the "typical sampling interval"
> for user space thermal control software.

Yes to the sampling interval, not so sure about doing anything in
luserspace. This should all be done in the kernel automatically.

> My fault ... during review process I pretty much re-wrote the
> whole commit message to follow the form of:
> "What is the problem?"
> "How are we fixing it"


> But I didn't want Srinivas to take the heat for any mistakes
> that were my fault. "Co-developed-by" really didn't explain
> what happened (since I didn't write any code, just made suggestions
> on things that needed to be changed/improved).

Yeah, so stuff like that is usually added with free text at the end of
the commit message where you have more than a couple of words in a tag
to explain what happened.