Re: Very high scheduling delay with plenty of idle CPUs
From: Christian Loehle
Date: Mon Nov 11 2024 - 04:09:02 EST
On 11/11/24 09:02, Saravana Kannan wrote:
> On Mon, Nov 11, 2024 at 12:25 AM Christian Loehle
> <christian.loehle@xxxxxxx> wrote:
>>
>> On 11/11/24 06:15, Saravana Kannan wrote:
>> [...]
>>>>> Can we tell the scheduler to just spread out all the tasks during
>>>>> suspend/resume? Doesn't make a lot of sense to try and save power
>>>>> during a suspend/resume. It's almost always cheaper/better to do those
>>>>> quickly.
>>>>
>>>> That would increase the resume latency right since each runnable task
>>>> needs to go through a full idle CPU selection cycle? Isn't time a
>>>> consideration / concern in the resume path? Unless we go through the
>>>> slow path, it is very likely we'll end up making the same task
>>>> placement decisions again?
>>>
>>> I actually quickly hacked up the cpu_overutilized() function to return
>>> true during suspend/resume and the threads are nicely spread out and
>>> running in parallel. That actually reduces the total of the
>>> dpm_resume*() phases from 90ms to 75ms on my Pixel 6.
>>>
>>> Also, this whole email thread started because I'm optimizing the
>>> suspend/resume code to reduce a lot of sleeps/wakeups and the number
>>> of kworker threads. And with that + over utilization hack, resume time
>>> has dropped to 60ms.
>>>
>>> Peter,
>>>
>>> Would you be open to the scheduler being aware of
>>> dpm_suspend*()/dpm_resume*() phases and triggering the CPU
>>> overutilized behavior during these phases? I know it's a very use case
>>> specific behavior but how often do we NOT want to speed up
>>> suspend/resume? We can make this a CONFIG or a kernel command line
>>> option -- say, fast_suspend or something like that.
>>>
>>
>> Just to confirm, you essentially want to disable EAS during
>> suspend/resume, or does sis also not give you an acceptable
>> placement?
>
> If I effectively disable EAS during the dpm_resume/no_irq/early()
> phases (the past of the resume where devices are resumed and can run
> in parallel), that gives the best results. It shaves 15ms off.
>
> More important than disabling EAS, I think the main need is to not
> preempt a runnable thread or delay scheduling a runnable thread. But
> yes, effectively, all CPUs end up getting used because there's enough
> work to keep all the CPUs busy for 5ms. With the current behavior (is
> it solely because of EAS?), some of the 5ms runs get stacked in one
> CPU and it ends up taking 5ms longer. And this happens in multiple
> phases and bumps it up by 15ms today. And this is all data averaged
> over 100+ samples. So it's very clear cut data and not just noise.
"Is it only EAS?"
I would hope so, EAS should be responsible for all placement in your
case.
Right, but potential latency costs are a side-effect of co-scheduling,
so I'm not sure I understand why you'd rather make EAS work for this
specific use-case instead of just disabling it for phases we know
it can't do the best job?
The entire post-EEVDF discussions are all about "Some workloads like
preemption, other's don't", but as long as we have plenty of idle
CPUs all that seems like unnecessary effort, am I missing something?
Regards,
Christian