Re: [PATCH V2] tracing, perf : add cpu hotplug trace events

From: Frederic Weisbecker
Date: Fri Jan 21 2011 - 21:42:43 EST


On Fri, Jan 21, 2011 at 06:41:58PM +0100, Vincent Guittot wrote:
> On 21 January 2011 17:44, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> > On Fri, Jan 21, 2011 at 09:43:18AM +0100, Vincent Guittot wrote:
> >> On 20 January 2011 17:11, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> >> > On Thu, Jan 20, 2011 at 09:25:54AM +0100, Vincent Guittot wrote:
> >> >> Please find below a new proposal for adding trace events for cpu hotplug.
> >> >> The goal is to measure the latency of each part (kernel, architecture)
> >> >> and also to trace the cpu hotplug activity with other power events. I
> >> >> have tested these traces events on an arm platform.
> >> >>
> >> >> Changes since previous version:
> >> >> -Use cpu_hotplug for trace name
> >> >> -Define traces for kernel core and arch parts only
> >> >> -Use DECLARE_EVENT_CLASS and DEFINE_EVENT
> >> >> -Use proper indentation
> >> >>
> >> >> Subject: [PATCH] cpu hotplug tracepoint
> >> >>
> >> >> this patch adds new events for cpu hotplug tracing
> >> >>  * plug/unplug sequence
> >> >>  * core and architecture latency measurements
> >> >>
> >> >> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> >> >> ---
> >> >>  include/trace/events/cpu_hotplug.h |  117 ++++++++++++++++++++++++++++++++++++
> >> >
> >> > Note we can't apply new tracepoints if they are not inserted in the code.
> >>
> >> I agree, i just want to have 1st feedbacks on the tracepoint interface
> >> before providing a patch which inserts the trace in the code.
> >>
> >> >
> >> >> +DEFINE_EVENT(cpu_hotplug, cpu_hotplug_arch_wait_die_start,
> >> >> +
> >> >> +     TP_PROTO(unsigned int cpuid),
> >> >> +
> >> >> +     TP_ARGS(cpuid)
> >> >> +);
> >> >> +
> >> >> +DEFINE_EVENT(cpu_hotplug, cpu_hotplug_arch_wait_die_end,
> >> >> +
> >> >> +     TP_PROTO(unsigned int cpuid),
> >> >> +
> >> >> +     TP_ARGS(cpuid)
> >> >> +);
> >> >
> >> > What is wait die, compared to die for example?
> >> >
> >>
> >> The arch_wait_die is used to trace the process which waits for the cpu
> >> to die (__cpu_die) and the arch_die is used to trace when the cpu dies
> >> (cpu_die)
> >
> > I still can't find the difference.
> >
> > Having:
> >
> > trace_cpu_hotplug_arch_die_start(cpu)
> > __cpu_die();
> > trace_cpu_hotplug_arch_die_end(cpu)
> >
> > Is not enough to get both the information that a cpu dies
> > and the time took to do so?
> >
>
> it's quite interesting to trace the cpu_die function because the cpu
> really dies in this one.

Note in case of success, you have barely the same time between die and
wait_die, the difference will reside in some completion wait/polling,
noise, mostly. Probably most of the time unnoticeable and irrelevant.

Plus if you opt for this scheme, you need to put your die hook into
every architectures, while otherwise a simple trace_cpu_die_start()
trace_cpu_die_stop() pair around __cpu_die() call in the generic code
is enough.

> The __cpu_die function can't return if the
> cpu fails to die in the very last step and then wake up. But this
> could be detected with some cpu_die traces.
>
>
> for a normal use case we have something like :
> cpu 0 enters __cpu_die
> cpu 1 enters cpu_die
> cpu1 acks that it is going to died
> cpu0 returns from __cpu_die
>
> if the cpu 1 fails to die at the very last step, we could have:
> cpu 0 enters __cpu_die
> cpu 1 enters cpu_idle --> cpu_die
> cpu1 leaves cpu_die because of some issues and comes back into cpu_idle.
> cpu0 returns from __cpu_die after a timeout or an error ack

If it fails in the hardware level, you'll certainly notice in your
power profiling because a CPU is not supposed to take seconds to
die. Especially with a such visual tool like pytimechart, it will
be obvious.

For the details, that's something that must be found in syslogs and
that's it.

I don't think it's a good idea to handle such buggy and unexpected case at
the tracepoint level. You don't want to profile bugs, you want to debug them.
So it doesn't belong to this space IMHO.

> Then, cpu_die traces can be used with power traces for profiling the
> cpu power state. May be, the power.h trace file is a better place for
> the cpu_die traces ?

Hmm, this should probably stay inside the cpu hotplug tracepoint family,
this is where people will seek them in the first place.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/