Re: [RFC PATCH 0/7] Introduce thermal pressure

From: Vincent Guittot
Date: Wed Oct 10 2018 - 09:39:14 EST


On Wed, 10 Oct 2018 at 15:35, Juri Lelli <juri.lelli@xxxxxxxxx> wrote:
>
> On 10/10/18 15:08, Vincent Guittot wrote:
> > On Wed, 10 Oct 2018 at 14:50, Juri Lelli <juri.lelli@xxxxxxxxx> wrote:
> > >
> > > On 10/10/18 14:34, Vincent Guittot wrote:
> > > > Hi Juri,
> > > >
> > > > On Wed, 10 Oct 2018 at 14:23, Juri Lelli <juri.lelli@xxxxxxxxx> wrote:
> > > > >
> > > > > On 10/10/18 14:04, Vincent Guittot wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > > > The problem was the same with RT, the cfs utilization was lower than
> > > > > > reality because RT steals soem cycle to CFS
> > > > > > So schedutil was selecting a lower frequency when cfs was running
> > > > > > whereas the CPU was fully used.
> > > > > > The same can happen with thermal:
> > > > > > cap the max freq because of thermal
> > > > > > the utilization with decrease.
> > > > > > remove the cap
> > > > > > the utilization is still low and you will select a low OPP because you
> > > > > > don't take into account cycle stolen by thermal like with RT
> > > > >
> > > > > What if we scale frequency component considering the capped temporary
> > > > > max?
> > > >
> > > > Do you mean using a kind of scale_thermal_capacity in accumulate_sum
> > > > when computing utilization ?
> > >
> > > Yeah, something like that I guess. So that we account for temporary
> > > "fake" 1024..
> >
> > But the utilization will not be invariant anymore across the system
>
> Mmm, I guess I might be wrong, but I was thinking we should be able to
> deal with this similarly to what we do with cpus with different max
> capacities. So, another factor? Because then, how do we handle other
> ways in which max freq can be restricted (e.g. from userspace as Javi
> was also mentioning)?

IMHO, userspace capping is a different story because it is not
expected to happen so often but it should stay for a while and in this
case, a solution is probably to rebuild the sched_domain and update
all the cpu_capacity struct and fields