Re: [ANNOUNCE] (Resend) Tools to analyse PM and scheduling behaviour

From: Sundar
Date: Fri Aug 22 2014 - 22:15:08 EST


Hi Amit,

On Tue, Aug 19, 2014 at 11:11 AM, Amit Kucheria
<amit.kucheria@xxxxxxxxxx> wrote:
>
> Weâre soliciting early feedback from community on the direction of idlestat

Nice :)

> Idlestat Details
> ----------------
> Idlestat uses FTRACE to capture traces related to C-state and P-state
> transitions of the CPU and wakeups (IRQ, IPI) on the system and then
> post-processes the data to print statistics. It is designed to be used
> non-interactively. Idlestat can deduce the idle time for a cluster as an
> intersection between the idle times of all the cpus belonging to the same
> cluster. This data is useful to analyse and optimise scheduling behaviour.
> The tool will also list how many times the menu governor mis-predicts
> target residency in a C-state.

We discussed this in the energy aware scheduling workshop this week @
the Kernel Summit. A few notes:

1. We need to really understand the co-relation of this tool w.r.t
actual hardware states.
It is usually likely that the software "thinks" it is in a low power
state, but the actual
hardware might not be. What is the coverage for these kind of cases here.

2. I understand that C/P states are a direct metric of how well the
workload behaved w.r.t power;
but I am not sure that relates to a direct measure of how the
scheduler performed. The C/P states
could be maintained whilst giving away performance or power at the
expense of additional components
on the SoC and platform like DDR IOs, fabric states etc.

Quick Summary of what I discussed with Daniel @ the workshop about idlestat:

1. There might be usually platform specific tools to get residencies
for P/C states.
PowerTop & Turbostat are two that first come to mind. Any specific
item apart from prediction logic
that idlestat differs from these two?

2. To me debugging performance or power, C/P states provide the
direction that something is wrong.
But they still dont tell me "what" is wrong "if" the issue is somehow
in the kernel as opposed to a more
easily fixable software code (traceable at hardware/software level for
best optimizations). How do I
conclude that my scheduler is the culprit apart from the points where
it took a decision to select the
right idle states based on predicted sleep times? In my opinion, that
would boil down to if the scheduler
was invoking too much load balancing calls, moving my threads across
cores too much, data being
thrashed across caches, cores too much etc.

I think a tool for scheduler metrics must be based on more inner
details like the above, finally culminating
into C/P states. as opposed to C/P states being the metric to be relied.

Let me know your thoughts.

Cheers!

-- these are my personal thoughts and do not represent my employers'
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/