Re: [PATCH V2 0/3] Introduce Thermal Pressure

From: Thara Gopinath
Date: Wed Apr 17 2019 - 20:07:31 EST


On 04/17/2019 02:29 PM, Ingo Molnar wrote:
>
> * Thara Gopinath <thara.gopinath@xxxxxxxxxx> wrote:
>
>>
>> On 04/17/2019 01:36 AM, Ingo Molnar wrote:
>>>
>>> * Thara Gopinath <thara.gopinath@xxxxxxxxxx> wrote:
>>>
>>>> The test results below shows 3-5% improvement in performance when
>>>> using the third solution compared to the default system today where
>>>> scheduler is unware of cpu capacity limitations due to thermal events.
>>>
>>> The numbers look very promising!
>>
>> Hello Ingo,
>> Thank you for the review.
>>>
>>> I've rearranged the results to make the performance properties of the
>>> various approaches and parameters easier to see:
>>>
>>> (seconds, lower is better)
>>>
>>> Hackbench Aobench Dhrystone
>>> ========= ======= =========
>>> Vanilla kernel (No Thermal Pressure) 10.21 141.58 1.14
>>> Instantaneous thermal pressure 10.16 141.63 1.15
>>> Thermal Pressure Averaging:
>>> - PELT fmwk 9.88 134.48 1.19
>>> - non-PELT Algo. Decay : 500 ms 9.94 133.62 1.09
>>> - non-PELT Algo. Decay : 250 ms 7.52 137.22 1.012
>>> - non-PELT Algo. Decay : 125 ms 9.87 137.55 1.12
>>>
>>>
>>> Firstly, a couple of questions about the numbers:
>>>
>>> 1)
>>>
>>> Is the 1.012 result for "non-PELT 250 msecs Dhrystone" really 1.012?
>>> You reported it as:
>>>
>>> non-PELT Algo. Decay : 250 ms 1.012 7.02%
>>
>> It is indeed 1.012. So, I ran the "non-PELT Algo 250 ms" benchmarks
>> multiple time because of the anomalies noticed. 1.012 is a formatting
>> error on my part when I copy pasted the results into a google sheet I am
>> maintaining to capture the test results. Sorry about the confusion.
>
> That's actually pretty good, because it suggests a 35% and 15%
> improvement over the vanilla kernel - which is very good for such
> CPU-bound workloads.
>
> Not that 5% is bad in itself - but 15% is better ;-)
>
>> Regarding the decay period, I agree that more testing can be done. I
>> like your suggestions below and I am going to try implementing them
>> sometime next week. Once I have some solid results, I will send them
>> out.
>
> Thanks!
>
>> My concern regarding getting hung up too much on decay period is that I
>> think it could vary from SoC to SoC depending on the type and number of
>> cores and thermal characteristics. So I was thinking eventually the
>> decay period should be configurable via a config option or by any other
>> means. Testing on different systems will definitely help and maybe I am
>> wrong and there is no much variation between systems.
>
> Absolutely, so I'd not be against keeping it a SCHED_DEBUG tunable or so,
> until there's a better understanding of how the physical properties of
> the SoC map to an ideal decay period.
>
> Assuming PeterZ & Rafael & Quentin doesn't hate the whole thermal load
> tracking approach. I suppose there's some connection of this to Energy
> Aware Scheduling? Or not ...
Mmm.. Not so much. This does not have much to do with EAS. The feature
itself will be really useful if there are asymmetric cpus in the system
rather than symmetric cpus. In case of SMP, since all cores have same
capacity and assuming any thermal mitigation will be implemented across
the all the cpus, there won't be any different scheduler behavior.

Regards
Thara
>
> Thanks,
>
> Ingo
>


--
Regards
Thara