Re: [ANNOUNCE] (Resend) Tools to analyse PM and scheduling behaviour

From: Sundar
Date: Sat Aug 30 2014 - 02:24:22 EST

Hi Amit,

On Tue, Aug 26, 2014 at 11:02 AM, Amit Kucheria
<amit.kucheria@xxxxxxxxxx> wrote:

> Consider the following examples:
> *On a given platform*, we see the same benchmark scores with and
> without patchset ABC, but including patchset ABC leads to better "power
> behaviour" i.e. requests of deeper idle states and/or lower frequencies.
> Consider another example where the benchmark score dramatically improves
> with patchset XYZ while the idle and frequency requests are marginally
> worse (shallower idle, reduced residency or increased frequency requests).
> In both cases, it is left to platforms to do real measurements to confirm that
> this is indeed the case. The latter example might not even be possible
> on some platforms, given some platform constraints e.g. the platform
> thermal envelope.
> Idlestat is not a replacement for real measurements. It is a tool to
> allow maintainers (scheduler, PM) to judge if any further investigation
> is needed and request such numbers from people running the code on
> various architectures before merging the patches.

As I mentioned, it is very much possible for a workload to preserve the CPU
C/P states but damage some other system metric like memory/soc bandwidth,
cache characteristics because the scheduler was probably doing more aggressive
task placements. I agree that no tool (within room for
errors/approximations) can
replace a physical measurement; my only query/concern being is C/P correlation
the direct or primary metric for scheduler behavior (not PM behavior).

> First, idlestat is designed to be architecture-independent. It only
> depends on what the kernel knows.
> Second, it is created with benchmarking in mind - non-interactive and
> minimal overhead.
> Third, it was designed for maintainers to be able to quickly tell if a
> patchset changes OS behaviour dramatically and request deeper
> analysis on various architectures.
> Fourth, it has the prediction logic which calculates the intersection of
> C-state requests by several cpus in a cluster to determine the cluster
> state.
> On top of this, we have two WIP additions:
> - an experimental "energy model" patch for idlestat that lets a SoC
> vendor provide the cost of various states as input and idlestat will
> output the "energy cost" of a workload.
> - a 'diff mode' to show the diff between two traces

I see this as no different from powertop; would it not be easier to
add the prediction
logic and investigate energy models integration? I dont mind a
different tool to be
doing almost same things, but is there really a need for one?

> Correct. At the moment, idlestat can only provide an indication if
> something might be wrong.

And that's where I think I see an immense value for idlestat to stick to
scheduler details beyond the traditional C/P state statistics.

> These would show up as regressions in benchmark results. Fengguang's
> excellent benchmark report[1] already captures such "changes". Does it
> make sense to recapture that in a tool?
> [1]

I am yet to digest that report, so apologies :)

> We're open to tracking more metrics if it is felt they are useful.
> One of the tenets of energy-aware scheduling is "improving energy
> efficiency with little or no performance regression". idlestat tells us
> about possible regressions on the energy front and benchmarks should
> tell us if we are regressing on performance. Hence the focus on
> C/P-states for now.

I would like to know your views on adding additional scheduler metrics
like task thrashing,
irregular placements, increased load balancing into the tool to be
able to zero in the
scheduler for efficiency losses. There might be more critical metrics
that I am missing...

The views expressed in this email are personal and do not necessarily
echo my employers.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at