Re: [PATCH 4/4] sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS

From: Srinivas Pandruvada
Date: Wed Jan 31 2018 - 12:44:25 EST


On Wed, 2018-01-31 at 11:17 +0100, Peter Zijlstra wrote:
> On Wed, Jan 31, 2018 at 10:22:49AM +0100, Rafael J. Wysocki wrote:
> > On Tuesday, January 30, 2018 2:15:31 PM CET Peter Zijlstra wrote:
> > > IA32_HWP_REQUEST has "Minimum_Performance", "Maximum_Performance"
> > > and
> > > "Desired_Performance" fields which can be used to give explicit
> > > frequency hints. And we really _should_ be doing that.
> > >
> > > Because, esp. in this scenario; a task migrating; the hardware
> > > really
> > > can't do anything sensible, whereas the OS _knows_.
> >
> > But IA32_HWP_REQUEST is not a cheap MSR to write to.
>
> That just means we might need to throttle writing to it, like it
> already
> does for the regular pstate (PERF_CTRL) msr in any case (also, is
> that a
> cheap msr?)
Much more throttling required compared to PERF_CTL. MSR_HWP_REQUEST is
much slower compared to PERF_CTL (as high as 10:1).

>
> Not touching it at all seems silly.
>
> But now that you made me look, intel_pstate_hwp_set() is horrible
> crap.
> You should _never_ do things like:
>
> rdmsr_on_cpu()
> /* frob value */
> wrmsr_on_cpu()
>
> That's insane.

Since the cpufreq callback is not guaranteed to be called on the same
CPU, we have to use rd/wrmsr_on_cpu().
But we can use smp_call_function_single() and optimize this.
This function is called only during init, when usermode changes
frequency limits and from thermal, so very few times.

Thanks,
Srinivas