Re: [PATCH RFC] sched/fair: Penalty the cfs task which executes mwait/hlt

From: Peter Zijlstra
Date: Mon Jan 13 2020 - 07:36:32 EST


On Mon, Jan 13, 2020 at 12:52:20PM +0100, Paolo Bonzini wrote:
> On 13/01/20 11:43, Peter Zijlstra wrote:
> > So the very first thing we need to get sorted is that MPERF/TSC ratio
> > thing. TurboStat does it, but has 'funny' hacks on like:
> >
> > b2b34dfe4d9a ("tools/power turbostat: KNL workaround for %Busy and Avg_MHz")
> >
> > and I imagine that there's going to be more exceptions there. You're
> > basically going to have to get both Intel and AMD to commit to this.
> >
> > IFF we can get concensus on MPERF/TSC, then yes, that is a reasonable
> > way to detect a VCPU being idle I suppose. I've added a bunch of people
> > who seem to know about this.
> >
> > Anyone, what will it take to get MPERF/TSC 'working' ?
>
> Do we really need MPERF/TSC for this use case, or can we just track
> APERF as well and do MPERF/APERF to compute the "non-idle" time?

So MPERF runs at fixed frequency (when !IDLE and typically the same
frequency as TSC), APERF runs at variable frequency (when !IDLE)
depending on DVFS state.

So APERF/MPERF gives the effective frequency of the core, but since both
stop during IDLE, it will not be a good indication of IDLE.

Otoh, TSC doesn't stop in idle (.oO this depends on
X86_FEATURE_CONSTANT_TSC) and therefore the MPERF/TSC ratio gives how
much !idle time there was between readings.