Re: [PATCH v7] sched: Consolidate cpufreq updates
From: Vincent Guittot
Date: Tue Sep 03 2024 - 02:54:37 EST
On Mon, 2 Sept 2024 at 22:43, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>
> On 09/02/24 15:36, Vincent Guittot wrote:
>
> > > If we have 1 CPU per PD, then relaxing affinity will allow it to run anywhere.
> > > I am just this will be safe on all platforms of course.
> > >
> > > But yeah, I don't think this is a solution anyway but the simplest thing to
> > > make it harder to hit.
> > >
> > > > The problem is that the 1st switch to task A will be preempted by
> > > > sugov so the 1st switch is useless. You should call cpufreq_update
> > > > before switching to A so that we skip the useless switch to task A and
> > > > directly switch to sugov 1st then task A
> > >
> > > Can we do this safely after we pick task A, but before we do the actual context
> > > switch? One of the reasons I put this too late is because there's a late call
> > > to balance_calbacks() that can impact the state of the rq and important to take
> > > into account based on my previous testing and analysis.
> >
> > I don't have all cases in mind and it would need more thinking but
> > this should be doable
> >
> > >
> > > Any reason we need to run the sugov worker as DL instead for example being
> > > a softirq?
> >
> > sugov can sleep
>
> Hmm. I thought the biggest worry is about this operation requires synchronous
> operation to talk to hw/fw to change the frequency which can be slow and we
> don't want this to happen from the scheduler fast path with all the critical
> locks held.
>
> If we sleep, then the sugov DL task will experience multiple context switches
> to perform its work. So it is very slow anyway.
A good reason to not add more
>
> IMO refactoring the code so we allow drivers that don't sleep to run from
> softirq context to speed things up and avoid any context switches is the
> sensible optimization to do.
AFAICT, only cpufreq fast_switch is known to be atomic, others can
take a lock and as a result sleep so it's not possible.
Please use fast_switch in this case but not softirq which can end up
in a daemon anyway.
>
> Drivers that sleep will experience significant delays and I'm not seeing the
> point of optimizing an additional context switch. I don't see we need to get
No, They don't have spurious wakeups now, your patch is adding one
more. I don't see why they should accept spurious context switch
> out of our way to accommodate these slow platforms. But give them the option to
> move to something better if they manage to write their code to avoid sleeps.
>
> Would this be a suitable option to move forward?
>
> FWIW I did test this on pixel 6 which !fast_switch and benchmark scores are
> either the same or better (less frame drops in UI particularly).