Re: [PATCH RFC] sched/fair: Penalty the cfs task which executes mwait/hlt

From: Wanpeng Li
Date: Tue Jan 14 2020 - 05:53:34 EST


On Mon, 13 Jan 2020 at 20:36, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jan 13, 2020 at 12:52:20PM +0100, Paolo Bonzini wrote:
> > On 13/01/20 11:43, Peter Zijlstra wrote:
> > > So the very first thing we need to get sorted is that MPERF/TSC ratio
> > > thing. TurboStat does it, but has 'funny' hacks on like:
> > >
> > > b2b34dfe4d9a ("tools/power turbostat: KNL workaround for %Busy and Avg_MHz")
> > >
> > > and I imagine that there's going to be more exceptions there. You're
> > > basically going to have to get both Intel and AMD to commit to this.
> > >
> > > IFF we can get concensus on MPERF/TSC, then yes, that is a reasonable
> > > way to detect a VCPU being idle I suppose. I've added a bunch of people
> > > who seem to know about this.
> > >
> > > Anyone, what will it take to get MPERF/TSC 'working' ?
> >
> > Do we really need MPERF/TSC for this use case, or can we just track
> > APERF as well and do MPERF/APERF to compute the "non-idle" time?
>
> So MPERF runs at fixed frequency (when !IDLE and typically the same
> frequency as TSC), APERF runs at variable frequency (when !IDLE)
> depending on DVFS state.
>
> So APERF/MPERF gives the effective frequency of the core, but since both
> stop during IDLE, it will not be a good indication of IDLE.
>
> Otoh, TSC doesn't stop in idle (.oO this depends on
> X86_FEATURE_CONSTANT_TSC) and therefore the MPERF/TSC ratio gives how
> much !idle time there was between readings.

Do you have a better solution to penalty vCPU process which mwait/hlt
executed inside? :)

Wanpeng