Re: [PATCH V2] cpufreq: Call transition notifier only once for each policy

From: Rafael J. Wysocki
Date: Mon Mar 18 2019 - 07:20:15 EST


On Mon, Mar 18, 2019 at 12:09 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Mon, Mar 18, 2019 at 11:54 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Mon, Mar 18, 2019 at 08:05:14AM +0530, Viresh Kumar wrote:
> > > On 15-03-19, 13:29, Peter Zijlstra wrote:
> > > > On Fri, Mar 15, 2019 at 02:43:07PM +0530, Viresh Kumar wrote:
> > > > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > > > > index 3fae23834069..cff8779fc0d2 100644
> > > > > --- a/arch/x86/kernel/tsc.c
> > > > > +++ b/arch/x86/kernel/tsc.c
> > > > > @@ -956,28 +956,38 @@ static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
> > > > > void *data)
> > > > > {
> > > > > struct cpufreq_freqs *freq = data;
> > > > > - unsigned long *lpj;
> > > > > -
> > > > > - lpj = &boot_cpu_data.loops_per_jiffy;
> > > > > -#ifdef CONFIG_SMP
> > > > > - if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> > > > > - lpj = &cpu_data(freq->cpu).loops_per_jiffy;
> > > > > -#endif
> > > > > + struct cpumask *cpus = freq->policy->cpus;
> > > > > + bool boot_cpu = !IS_ENABLED(CONFIG_SMP) || freq->flags & CPUFREQ_CONST_LOOPS;
> > > > > + unsigned long lpj;
> > > > > + int cpu;
> > > > >
> > > > > if (!ref_freq) {
> > > > > ref_freq = freq->old;
> > > > > - loops_per_jiffy_ref = *lpj;
> > > > > tsc_khz_ref = tsc_khz;
> > > > > +
> > > > > + if (boot_cpu)
> > > > > + loops_per_jiffy_ref = boot_cpu_data.loops_per_jiffy;
> > > > > + else
> > > > > + loops_per_jiffy_ref = cpu_data(cpumask_first(cpus)).loops_per_jiffy;
> > > > > }
> > > > > +
> > > > > if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
> > > > > (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
> > > > > - *lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> > > > > -
> > > > > + lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> > > > > tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
> > > > > +
> > > > > if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> > > > > mark_tsc_unstable("cpufreq changes");
> > > > >
> > > > > - set_cyc2ns_scale(tsc_khz, freq->cpu, rdtsc());
> > > > > + if (boot_cpu) {
> > > > > + boot_cpu_data.loops_per_jiffy = lpj;
> > > > > + } else {
> > > > > + for_each_cpu(cpu, cpus)
> > > > > + cpu_data(cpu).loops_per_jiffy = lpj;
> > > > > + }
> > > > > +
> > > > > + for_each_cpu(cpu, cpus)
> > > > > + set_cyc2ns_scale(tsc_khz, cpu, rdtsc());
> > > >
> > > > This code doesn't make sense, the rdtsc() _must_ be called on the CPU in
> > > > question.
> > >
> > > You mean rdtsc() must be locally on that CPU? The cpufreq core never guaranteed
> > > that and it was left for the notifier to do. This patch doesn't change the
> > > behavior at all, just that it moves the for-loop to the notifier instead of the
> > > cpufreq core.
> >
> > Yuck..
> >
> > Rafael; how does this work in practise? Earlier you said that on x86 the
> > policies typically have a single cpu in them anyway.
>
> Yes.
>
> > Is the freq change also notified from _that_ cpu?
>
> May not be, depending on what CPU runs the work item/thread changing
> the freq. It generally is not guaranteed to always be the same as the
> target CPU.

Actually, scratch that.

On x86, with one CPU per cpufreq policy, that will always be the target CPU.

> > I don't think I have old enough hardware around anymore to test any of
> > this. This was truly ancient p6 era stuff IIRC.
> >
> > Because in that case, I'm all for not doing the changes to this notifier
> > Viresh is proposing but simply adding something like:
> >
> >
> > WARN_ON_ONCE(cpumask_weight(cpuc) != 1);
> > WARN_ON_ONCE(cpumask_first(cpuc) != smp_processor_id());
> >
> > And leave it at that.
>
> That may not work I'm afraid.

So something like that could work.