Re: [RFC v4 0/8] TurboSched: A scheduler for sustaining Turbo Frequencies for longer durations
From: Parth Shah
Date: Fri Aug 02 2019 - 07:12:15 EST
On 7/31/19 11:02 PM, Pavel Machek wrote:
> Hi!
>
>>>> Abstract
>>>> ========
>>>>
>>>> The modern servers allows multiple cores to run at range of frequencies
>>>> higher than rated range of frequencies. But the power budget of the system
>>>> inhibits sustaining these higher frequencies for longer durations.
>>>
>>> Thermal budget?
>>
>> Right, it is a good point, and there can be possibility of Thermal throttling
>> which is not covered here.
>> But the thermal throttling is less often seen in the servers than the throttling
>> due to the Power budget constraints. Also one can change the power cap which leads
>> to increase in the throttling and task packing can handle in such
>> cases.
>
> Ok. I thought you are doing this due to thermals. If I understand
> things correctly, you can go over thermal limits for a few seconds
> before the silicon heats up. What is the timescale for power budget?
>
I guess it varies across architectures.
AFAIK, in the POWER systems, the frequency is throttled down instantaneously as we exceed
the power budget.
If an idle core is woken up and the power budget is exceeded then the system throttles
down to the frequency value that is know to be sustainable with that many busy cores.
>> BTW, Task packing allows few more cores to remain idle for longer time, so
>> shouldn't this decrease thermal throttles upto certain extent?
>
> I guess so, yes.
>
>>>>> These numbers are w.r.t. `turbo_bench.c` multi-threaded test benchmark
>>>> which can create two kinds of tasks: CPU bound (High Utilization) and
>>>> Jitters (Low Utilization). N in X-axis represents N-CPU bound and N-Jitter
>>>> tasks spawned.
>>>
>>> Ok, so you have description how it causes 13% improvements. Do you also have metrics how
>>> it harms performance.. how much delay is added to unimportant tasks etc...?
>>>
>>
>> Yes, if we try to pack the tasks despite of no frequency throttling, we see a regression
>> around 5%. For instance, in the synthetic benchmark I used to show performance benefit,
>> for lower count of CPU intensive threads (N=2) there is -5% performance drop.
>>
>> Talking about the delay added to an unimportant tasks, the result can be lower throughput
>> or higher latency for such tasks.
>
> Thanks. I believe it would be good to mention disadvantages in the
> documentation, too.
Sure, I will add the mentioned possible regression on jitter tasks in the documentation
somewhere.
Thanks,
Parth