Re: [PATCH 1/2] Modify cpupower to schedule itself on cores it is reading MSRs from

From: Thomas Renninger
Date: Thu Oct 10 2019 - 07:22:26 EST

On Monday, October 7, 2019 11:11:30 PM CEST Natarajan, Janakarajan wrote:
> On 10/5/2019 7:40 AM, Thomas Renninger wrote:
> >>
> >> APERF/MPERF from CPL > 0) and avoid using the msr module (patch 2).
> >
> > And this one only exists on latest AMD cpus, right?
> Yes. The RDPRU instruction exists only on AMD cpus.
> >
> >> However, for systems that provide an instruction to get register values
> >> from userspace, would a command-line parameter be acceptable?
> >
> > Parameter sounds like a good idea. In fact, there already is such a
> > paramter.
cpupower monitor --help
> >
> > -c
> >
> > Schedule the process on every core before starting and
> > ending
> >
> > measuring. This could be needed for the Idle_Stats monitor when no other
> > MSR based monitor (has to be run on the core that is measured) is run in
> > parallel. This is to wake up the processors from deeper sleep states and
> > let the kernel reaccount its cpuidle (C-state) information before reading
> > the cpuidle timings from sysfs.
> >
> > Best is you exchange the order of your patches. The 2nd looks rather
> > straight forward and you can add my reviewed-by.
> The RDPRU instruction reads the APERF/MPERF of the cpu on which it is
> running. If we do not schedule it on each cpu specifically, it will read the APERF/MPERF
> of the cpu in which it runs/might happen to run on, which will not be the correct behavior.

Got it. And I also didn't fully read -c. I now remember.. For C-states accounting
you want to have each CPU woken up at measure start and end for accurate measuring.

It's a pity that the monitors do the per_cpu calls themselves.
So a general idle-monitor param is not possible or can only done by for example by
adding a flag to the cpuidle_monitor struct:

struct cpuidle_monitor

unsigned int needs_root:1
unsigned int per_cpu_schedule:1

not sure whether a:
struct {
unsigned int needs_root:1
unsigned int per_cpu_schedule:1
} flags

should/must be put around in a separate cleanup patch (and needs_root users adjusted).

You (and other monitors for which this might make sense) can then implement
the per_cpu_schedule flag. In AMD case you might want (you have to)
directly set it.

All around a -b/-u (--bind-measure-to-cpu, --unbind-measure-to-cpu)
parameter could be added at some point of time if it matters. And monitors
having this could bind or not.
This possibly could nuke out -c param. Or at least the idle state counter
monitor could do it itself. But don't mind about this.

What do you think?

And you should be able to re-use the bind_cpu function used in -c case?