Re: [PATCH v2 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0

From: Qais Yousef
Date: Wed May 31 2023 - 14:23:00 EST


Hi Lukasz!

Sorry for late response..

On 05/22/23 09:30, Lukasz Luba wrote:
> Hi Qais,
>
> I have a question regarding the 'soft cpu affinity'.

[...]

> > IIUC I'm not seeing this being a problem. The goal of capping with uclamp_max
> > is two folds:
> >
> > 1. Prevent tasks from consuming energy.
> > 2. Keep them away from expensive CPUs.
> >
> > 2 is actually very important for 2 reasons:
> >
> > a. Because of max aggregation - any uncapped tasks that wakes up will
> > cause a frequency spike on this 'expensive' cpu. We don't have
> > a mechanism to downmigrate it - which is another thing I'm working
> > on.
> > b. It is desired to keep these bigger cpu idle ready for more important
> > work.
> >
> > For 2, generally we don't want these tasks to steal bandwidth from these CPUs
> > that we'd like to preserve for other type of work.
>
> I'm a bit afraid about such 'strong force'. That means the task would
> not go via EAS if we set uclamp_max e.g. 90, while the little capacity
> is 125. Or am I missing something?

We should go via EAS, actually that's the whole point.

Why do you think we won't go via EAS? The logic should be is we give a hint to
prefer the little core, but we still can pick something else if it's more
energy efficient.

What uclamp_max enables us is to still consider that little core even if it's
utilization says it doesn't fit there. We need to merge these patches first
though as it's broken at the moment. if little capacity is 125 and utilization
of the task is 125, then even if uclamp_max is 0, EAS will skip the whole
little cluster as apotential candidate because there's no spare_capacity there.
Even if the whole little cluster is idle.

>
> This might effectively use more energy for those tasks which can run on
> any CPU and EAS would figure a good energy placement. I'm worried
> about this, since we have L3+littles in one DVFS domain and the L3
> would be only bigger in future.

It's a bias that will enable the search algorithm in EAS to still consider the
little core for big tasks. This bias will depend on the uclamp_max value chosen
by userspace (so they have some control on how hard to cap the task), and what
else is happening in the system at the time it wakes up.

>
> IMO to keep the big cpus more in idle, we should give them big energy
> wake up cost. That's my 3rd feature to the EM presented in OSPM2023.

Considering the wake up cost in EAS would be a great addition to have :)

>
> >
> > Of course userspace has control by selecting the right uclamp_max value. They
> > can increase it to allow a spill to next pd - or keep it low to steer them more
> > strongly on a specific pd.
>
> This would we be interesting to see in practice. I think we need such
> experiment, for such changes.

I'm not sure what you mean by 'such changes'. I hope you don't mean these
patches as they are not the key. They fix an obvious bug where task placement
hint won't work at all. They don't modify any behavior that shouldn't have
already been there. Nor introduce new limitation. I have to say I am
disappointed that these patches aren't considered an important fix for an
obvious breakage.


Thanks

--
Qais Yousef