Re: [PATCH v2 1/2] x86,sched: Add support for frequency invariance

From: Giovanni Gherdovich
Date: Tue Oct 08 2019 - 03:43:10 EST


On Thu, 2019-10-03 at 19:53 +0200, Rafael J. Wysocki wrote:
> On Thursday, October 3, 2019 2:15:37 PM CEST Peter Zijlstra wrote:
> > On Thu, Oct 03, 2019 at 12:27:52PM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, October 2, 2019 2:29:25 PM CEST Giovanni Gherdovich wrote:
> > > > +static bool turbo_disabled(void)
> > > > +{
> > > > + u64 misc_en;
> > > > + int err;
> > > > +
> > > > + err = rdmsrl_safe(MSR_IA32_MISC_ENABLE, &misc_en);
> > > > + if (err)
> > > > + return false;
> > > > +
> > > > + return (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE);
> > > > +}
> > >
> > > This setting may be updated by the platform firmware (BIOS) in some cases
> > > (see kernel.org BZ 200759, for example), so in general checking it once
> > > at the init time is not enough.
> >
> > Is there anything sane we can do if the BIOS frobs stuff like that under
> > our feet? Other than yell bloody murder, that is?
>
> Sane? No, I don't think so.
>
> Now, in principle *something* could be done to fix things up in the _PPC
> notify handler, but I guess we would just end up disabling the scale
> invariance code altogether in those cases.

I'm looking at how to react to turbo being disabled at run time, assuming a
_PPC notification is triggered in that case.

I don't think the correct action would be to disable scale invariance: if the
turbo range is not available, then max frequency is max_P, and scale
invariance can go on using that. The case max_freq=max_P is represented by
arch_max_freq=1024 in this patch (because arch_max_freq=max_freq*1024/max_P).

Since the variable arch_max_freq is global to all CPUs, the fact that the _PPC
notification is sent to just one CPU is not a concern: the CPU receiving the
notif will set arch_max_freq=1024 (Srinivas was worried about this in another
message).

This looks like a job for the ->update_limits callback you added to "struct
cpufreq_driver" in response to the mentioned kernel.org BZ 200759.
I see that only intel_pstate implements it, it's not clear to me yet if I'll
have to give an ->update_limits to acpi_cpufreq as well to treat this case.


Giovanni