Re: [RFC PATCH 0/7] Introduce thermal pressure
From: Quentin Perret
Date: Wed Oct 10 2018 - 09:48:05 EST
On Wednesday 10 Oct 2018 at 15:27:57 (+0200), Vincent Guittot wrote:
> On Wed, 10 Oct 2018 at 15:05, Quentin Perret <quentin.perret@xxxxxxx> wrote:
> >
> > On Wednesday 10 Oct 2018 at 14:04:40 (+0200), Vincent Guittot wrote:
> > > This patchset doesn't touch cpu_capacity_orig and doesn't need to as
> > > it assume that the max capacity is unchanged but some capacity is
> > > momentary stolen by thermal.
> > > If you want to reflect immediately all thermal capping change, you
> > > have to update this field and all related fields and struct around
> >
> > I don't follow you here. I never said I wanted to change
> > cpu_capacity_orig. I don't think we should do that actually. Changing
> > capacity_of (which is updated during LB IIRC) is just fine. The question
> > is about what you want to do there: reflect an averaged value or the
> > instantaneous one.
>
> Sorry I though your were speaking about updating this cpu_capacity_orig.
N/p, communication via email can easily become confusing :-)
> With using instantaneous max value in capacity_of(), we are back to
> the problem raised by Thara that the value will most probably not
> reflect the current capping value when it is used in LB, because LB
> period can quite long on busy CPU (default max value is 32*sd_weight
> ms)
But averaging the capping value over time doesn't make LB happen more
often ... That will help you account for capping that happened in the
past, but it's not obvious this is actually a good thing. Probably not
all the time anyway.
Say a CPU was capped at 50% of it's capacity, then the cap is removed.
At that point it'll take 100ms+ for the thermal signal to decay and let
the scheduler know about the newly available capacity. That can probably
be a performance hit in some use cases ... And the other way around, it
can also take forever for the scheduler to notice that a CPU has a
reduced capacity before reacting to it.
If you want to filter out very short transient capping events to avoid
over-reacting in the scheduler (is this actually happening ?), then
maybe the average should be done on the cooling device side or something
like that ?
Thanks,
Quentin