Re: [PATCH v7] sched: Consolidate cpufreq updates
From: Christian Loehle
Date: Mon Sep 02 2024 - 08:35:49 EST
On 9/2/24 13:30, Vincent Guittot wrote:
> On Sun, 1 Sept 2024 at 19:51, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>>
>> On 08/13/24 10:27, Vincent Guittot wrote:
>>> On Tue, 13 Aug 2024 at 10:25, Vincent Guittot
>>> <vincent.guittot@xxxxxxxxxx> wrote:
>>>>
>>>> On Mon, 5 Aug 2024 at 17:35, Christian Loehle <christian.loehle@xxxxxxx> wrote:
>>>>> Hi Qais,
>>>>> the idea of SCHED_CPUFREQ_FORCE_UPDATE and the possiblity of spamming
>>>>> freq updates still bothered me so let me share my thoughts even though
>>>>> it might be niche enough for us not to care.
>>>>>
>>>>> 1. On fast_switch systems, assuming they are fine with handling the
>>>>> actual updates, we have a bit more work on each context_switch() and
>>>>> some synchronisation, too. That should be fine, if anything there's
>>>>> some performance regression in a couple of niche cases.
>>>>>
>>>>> 2. On !fast_switch systems this gets more interesting IMO. So we have
>>>>> a sugov DEADLINE task wakeup for every (in a freq-diff resulting)
>>>>> update request. This task will preempt whatever and currently will
>>>>> pretty much always be running on the CPU it ran last on (so first CPU
>>>>> of the PD).
>>>>
>>>> The !fast_switch is a bit of concern for me too but not for the same
>>>> reason and maybe the opposite of yours IIUC your proposal below:
>>>>
>>>> With fast_switch we have the following sequence:
>>>>
>>>> sched_switch() to task A
>>>> cpufreq_driver_fast_switch -> write new freq target
>>>> run task A
>>>>
>>>> This is pretty straight forward but we have the following sequence
>>>> with !fast_switch
>>>>
>>>> sched_switch() to task A
>>>> queue_irq_work -> raise an IPI on local CPU
>>>> Handle IPI -> wakeup and queue sugov dl worker on local CPU (always
>>>> with 1 CPU per PD)
>>>> sched_switch() to sugov dl task
>>>> __cpufreq_driver_target() which can possibly block on a lock
>>>> sched_switch() to task A
>>>> run task A
>>>>
>>>
>>> sent a bit too early
>>>
>>>> We can possibly have 2 context switch and one IPi for each "normal"
>>>> context switch which is not really optimal
>>>
>>> It would be good to find a way to skip the spurious back and forth
>>> between the normal task and sugov
>>
>> Hmm I think we use affinity to keep the sugov running on policy->related_cpus.
>> Relaxing this will make it less of a problem, but won't eliminate it.
>
> yes, but it's not a problem of relaxing affinity here
>
> The problem is that the 1st switch to task A will be preempted by
> sugov so the 1st switch is useless. You should call cpufreq_update
> before switching to A so that we skip the useless switch to task A and
> directly switch to sugov 1st then task A
Not necessarily, if you relax them to all CPUs the sugov tasks for all PDs
can run on CPU0, no matter which CPU task A is on.
There is some benefit on having all sugov threads on the littles for many
platforms (i.e. restricting them), making a good mainline policy out of
it isn't quite so obvious to me.