Re: [PATCH tip 1/3] trace: better manage the context info forevents

From: Arnaldo Carvalho de Melo
Date: Tue Feb 03 2009 - 07:51:23 EST


Em Tue, Feb 03, 2009 at 03:07:47AM +0100, Frederic Weisbecker escreveu:
> On Mon, Feb 02, 2009 at 08:29:21PM -0200, Arnaldo Carvalho de Melo wrote:
> > From: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> >
> > Impact: make trace_event more convenient for tracers
> >
> > All tracers (for the moment) that use the struct trace_event want to
> > have the context info printed before their own output: the pid/cmdline,
> > cpu, and timestamp.
> >
> > But some other tracers that want to implement their trace_event
> > callbacks will not necessary need these information or they may want to
> > format them as they want.
> >
> > This patch adds a new default-enabled trace option:
> > TRACE_ITER_CONTEXT_INFO When disabled through:
> >
> > echo nocontext-info > /debugfs/tracing/trace_options
> >
> > The pid, cpu and timestamps headers will not be printed.
> >
> > IE with the sched_switch tracer with context-info (default):
> >
> > bash-2935 [001] 100.356561: 2935:120:S ==> [001] 0:140:R <idle>
> > <idle>-0 [000] 100.412804: 0:140:R + [000] 11:115:S events/0
> > <idle>-0 [000] 100.412816: 0:140:R ==> [000] 11:115:R events/0
> > events/0-11 [000] 100.412829: 11:115:S ==> [000] 0:140:R <idle>
> >
> > Without context-info:
> >
> > 2935:120:S ==> [001] 0:140:R <idle>
> > 0:140:R + [000] 11:115:S events/0
> > 0:140:R ==> [000] 11:115:R events/0
> > 11:115:S ==> [000] 0:140:R <idle>
> >
> > A tracer can disable it at runtime by clearing the bit
> > TRACE_ITER_CONTEXT_INFO in trace_flags.
> >
> > The print routines were renamed to trace_print_context and
> > trace_print_lat_context, so that they can be used by tracers if they
> > want to use them for one of the trace_event callbacks.
>

> Actually, I wonder if this is not breaking the sense of the
> TRACE_ITER_CONTEXT_INFO flag.

Possibly :)

> In the first patch I made about it, I thought this flag was to decide
> whether we want to print the context information in the standard way.

Perhaps we should have a different flag then, one for the tracers to
inform the tracing core what facilities it wants.

> Then, Steven suggested to actually provide callbacks for the tracers
> which want to override the standard context information printer.

> The flag then got more logical: TRACE_ITER_CONTEXT_INFO was not only
> about deciding whether we want or not the standard context info, it
> tells if we want in a global way the context info to be printed,
> whatever how the tracer decides to print it. (This is the theory, but
> since we couldn't override the bin/raw/hex cases, the practical case
> didn't follow this idea).

> If someone doesn't want to see these informations in the blk tracer,
> this flag will not help him. Worst, it will double print the context
> info if the user enables the flag.

Yes, this was for _blktrace_ itself to tell that it doesn't want the
context info to be printed in all its callbacks, while allowing it to
use it in the callbacks that makes sense to it, namely ->trace().

> Now that I see the practical case, I'm not sure the design of my patch
> was valuable. A tracer has to play with the flag if it wants to
> override the context info in the bin/raw/hex cases. And I don't think
> this is a good way to proceed.

> In my opinion, the ITER_CONTEXT_INFO flag should mostly be set by the
> user.

Well, after trying to use it I start to think that it has not much value
for the user.

> And only one callback could be added to trace_event: context_info()
> Then, the tracer will manage itself the raw/hex/bin/normal cases
> inside this callback.

So this callback, in a sense, would be like having a flag, where instead
of testing:

if (trace_flags & TRACE_ITER_CONTEXT_INFO)
trace_print_context(iter);

we will do:

if (event->context_info)
event->context_info()

OK, we can assign the default when we register if the user doesn't
initializes .context_info in the trace_event init site.

Then I think exporting the context info flag to users becomes
overengineering, i.e. nobody asked for that or found it really useful.

> We can provide the default callbacks available for the tracers which
> want it and even one function which proceed all of them, depending on
> the flags. ftrace/preempt/sched.... tracers can register this
> function for their context_info callback and other tracers too if they
> want.

> Or they can override it, and even pick the default callbacks for
> dedicated flags when they want.

More or less like its in the 3rd patch (for blktrace) I sent in this
series, i.e. the "default callback" is a trace lib function that can be
called from tracers.

> Then, when the user wants the context info to be printed or not, he
> just have to set/clear the context-info flag manually.

Here is the point, after this discussion I think we should leave
exporting the context-info flag for later, when we see real benefit to
adding it.

> A tracer can even decide to set/clear it by default, but for its real
> sense: print or not these context info.

But this is what the thir patch does, no?

What I tried to achieve with this series of patches is to reduce
complexity, avoiding adding new function pointers, as using the
functions you refactored into default functions and that I renamed to
make them library functions, be called directly instead of thru a
pointer in the trace_event that is set to a default if the user doesn't
provides a custom function _or_ includes it in the trace_event
initialization.

The standard tracers that just want the default were left untouched.

> What do you think?
>
>
> > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> > Cc: Jens Axboe <jens.axboe@xxxxxxxxxx>
> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxx>
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> > ---
> > kernel/trace/trace.c | 149 ++++++++++---------------------------------
> > kernel/trace/trace.h | 7 ++-
> > kernel/trace/trace_output.c | 107 +++++++++++++++++++++++++++++++
> > kernel/trace/trace_output.h | 3 +
> > 4 files changed, 151 insertions(+), 115 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 2f8ac1f..5ec49c3 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -227,7 +227,7 @@ static DECLARE_WAIT_QUEUE_HEAD(trace_wait);
> >
> > /* trace_flags holds trace_options default values */
> > unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
> > - TRACE_ITER_ANNOTATE;
> > + TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO;
> >
> > /**
> > * trace_wake_up - wake up tasks waiting for trace input
> > @@ -285,6 +285,7 @@ static const char *trace_options[] = {
> > "userstacktrace",
> > "sym-userobj",
> > "printk-msg-only",
> > + "context-info",
> > NULL
> > };
> >
> > @@ -1171,8 +1172,8 @@ __find_next_entry(struct trace_iterator *iter, int *ent_cpu, u64 *ent_ts)
> > }
> >
> > /* Find the next real entry, without updating the iterator itself */
> > -static struct trace_entry *
> > -find_next_entry(struct trace_iterator *iter, int *ent_cpu, u64 *ent_ts)
> > +struct trace_entry *trace_find_next_entry(struct trace_iterator *iter,
> > + int *ent_cpu, u64 *ent_ts)
> > {
> > return __find_next_entry(iter, ent_cpu, ent_ts);
> > }
> > @@ -1351,57 +1352,6 @@ print_trace_header(struct seq_file *m, struct trace_iterator *iter)
> > seq_puts(m, "\n");
> > }
> >
> > -static void
> > -lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
> > -{
> > - int hardirq, softirq;
> > - char *comm;
> > -
> > - comm = trace_find_cmdline(entry->pid);
> > -
> > - trace_seq_printf(s, "%8.8s-%-5d ", comm, entry->pid);
> > - trace_seq_printf(s, "%3d", cpu);
> > - trace_seq_printf(s, "%c%c",
> > - (entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
> > - (entry->flags & TRACE_FLAG_IRQS_NOSUPPORT) ? 'X' : '.',
> > - ((entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'N' : '.'));
> > -
> > - hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
> > - softirq = entry->flags & TRACE_FLAG_SOFTIRQ;
> > - if (hardirq && softirq) {
> > - trace_seq_putc(s, 'H');
> > - } else {
> > - if (hardirq) {
> > - trace_seq_putc(s, 'h');
> > - } else {
> > - if (softirq)
> > - trace_seq_putc(s, 's');
> > - else
> > - trace_seq_putc(s, '.');
> > - }
> > - }
> > -
> > - if (entry->preempt_count)
> > - trace_seq_printf(s, "%x", entry->preempt_count);
> > - else
> > - trace_seq_puts(s, ".");
> > -}
> > -
> > -unsigned long preempt_mark_thresh = 100;
> > -
> > -static void
> > -lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
> > - unsigned long rel_usecs)
> > -{
> > - trace_seq_printf(s, " %4lldus", abs_usecs);
> > - if (rel_usecs > preempt_mark_thresh)
> > - trace_seq_puts(s, "!: ");
> > - else if (rel_usecs > 1)
> > - trace_seq_puts(s, "+: ");
> > - else
> > - trace_seq_puts(s, " : ");
> > -}
> > -
> > static void test_cpu_buff_start(struct trace_iterator *iter)
> > {
> > struct trace_seq *s = &iter->seq;
> > @@ -1419,46 +1369,24 @@ static void test_cpu_buff_start(struct trace_iterator *iter)
> > trace_seq_printf(s, "##### CPU %u buffer started ####\n", iter->cpu);
> > }
> >
> > -static enum print_line_t
> > -print_lat_fmt(struct trace_iterator *iter, unsigned int trace_idx, int cpu)
> > +static enum print_line_t print_lat_fmt(struct trace_iterator *iter)
> > {
> > struct trace_seq *s = &iter->seq;
> > unsigned long sym_flags = (trace_flags & TRACE_ITER_SYM_MASK);
> > - struct trace_entry *next_entry;
> > struct trace_event *event;
> > - unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
> > struct trace_entry *entry = iter->ent;
> > - unsigned long abs_usecs;
> > - unsigned long rel_usecs;
> > - u64 next_ts;
> > - char *comm;
> > int ret;
> >
> > test_cpu_buff_start(iter);
> >
> > - next_entry = find_next_entry(iter, NULL, &next_ts);
> > - if (!next_entry)
> > - next_ts = iter->ts;
> > - rel_usecs = ns2usecs(next_ts - iter->ts);
> > - abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
> > -
> > - if (verbose) {
> > - comm = trace_find_cmdline(entry->pid);
> > - trace_seq_printf(s, "%16s %5d %3d %d %08x %08x [%08lx]"
> > - " %ld.%03ldms (+%ld.%03ldms): ",
> > - comm,
> > - entry->pid, cpu, entry->flags,
> > - entry->preempt_count, trace_idx,
> > - ns2usecs(iter->ts),
> > - abs_usecs/1000,
> > - abs_usecs % 1000, rel_usecs/1000,
> > - rel_usecs % 1000);
> > - } else {
> > - lat_print_generic(s, entry, cpu);
> > - lat_print_timestamp(s, abs_usecs, rel_usecs);
> > + event = ftrace_find_event(entry->type);
> > +
> > + if (trace_flags & TRACE_ITER_CONTEXT_INFO) {
> > + ret = trace_print_lat_context(iter);
> > + if (ret)
> > + return ret;
> > }
> >
> > - event = ftrace_find_event(entry->type);
> > if (event && event->latency_trace) {
> > ret = event->latency_trace(s, entry, sym_flags);
> > if (ret)
> > @@ -1476,33 +1404,20 @@ static enum print_line_t print_trace_fmt(struct trace_iterator *iter)
> > unsigned long sym_flags = (trace_flags & TRACE_ITER_SYM_MASK);
> > struct trace_entry *entry;
> > struct trace_event *event;
> > - unsigned long usec_rem;
> > - unsigned long long t;
> > - unsigned long secs;
> > - char *comm;
> > int ret;
> >
> > entry = iter->ent;
> >
> > test_cpu_buff_start(iter);
> >
> > - comm = trace_find_cmdline(iter->ent->pid);
> > -
> > - t = ns2usecs(iter->ts);
> > - usec_rem = do_div(t, 1000000ULL);
> > - secs = (unsigned long)t;
> > + event = ftrace_find_event(entry->type);
> >
> > - ret = trace_seq_printf(s, "%16s-%-5d ", comm, entry->pid);
> > - if (!ret)
> > - return TRACE_TYPE_PARTIAL_LINE;
> > - ret = trace_seq_printf(s, "[%03d] ", iter->cpu);
> > - if (!ret)
> > - return TRACE_TYPE_PARTIAL_LINE;
> > - ret = trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
> > - if (!ret)
> > - return TRACE_TYPE_PARTIAL_LINE;
> > + if (trace_flags & TRACE_ITER_CONTEXT_INFO) {
> > + ret = trace_print_context(iter);
> > + if (ret)
> > + return ret;
> > + }
> >
> > - event = ftrace_find_event(entry->type);
> > if (event && event->trace) {
> > ret = event->trace(s, entry, sym_flags);
> > if (ret)
> > @@ -1525,10 +1440,12 @@ static enum print_line_t print_raw_fmt(struct trace_iterator *iter)
> >
> > entry = iter->ent;
> >
> > - ret = trace_seq_printf(s, "%d %d %llu ",
> > - entry->pid, iter->cpu, iter->ts);
> > - if (!ret)
> > - return TRACE_TYPE_PARTIAL_LINE;
> > + if (trace_flags & TRACE_ITER_CONTEXT_INFO) {
> > + ret = trace_seq_printf(s, "%d %d %llu ",
> > + entry->pid, iter->cpu, iter->ts);
> > + if (!ret)
> > + return TRACE_TYPE_PARTIAL_LINE;
> > + }
> >
> > event = ftrace_find_event(entry->type);
> > if (event && event->raw) {
> > @@ -1553,9 +1470,11 @@ static enum print_line_t print_hex_fmt(struct trace_iterator *iter)
> >
> > entry = iter->ent;
> >
> > - SEQ_PUT_HEX_FIELD_RET(s, entry->pid);
> > - SEQ_PUT_HEX_FIELD_RET(s, iter->cpu);
> > - SEQ_PUT_HEX_FIELD_RET(s, iter->ts);
> > + if (trace_flags & TRACE_ITER_CONTEXT_INFO) {
> > + SEQ_PUT_HEX_FIELD_RET(s, entry->pid);
> > + SEQ_PUT_HEX_FIELD_RET(s, iter->cpu);
> > + SEQ_PUT_HEX_FIELD_RET(s, iter->ts);
> > + }
> >
> > event = ftrace_find_event(entry->type);
> > if (event && event->hex)
> > @@ -1575,7 +1494,7 @@ static enum print_line_t print_printk_msg_only(struct trace_iterator *iter)
> >
> > trace_assign_type(field, entry);
> >
> > - ret = trace_seq_printf(s, field->buf);
> > + ret = trace_seq_printf(s, "%s", field->buf);
> > if (!ret)
> > return TRACE_TYPE_PARTIAL_LINE;
> >
> > @@ -1590,9 +1509,11 @@ static enum print_line_t print_bin_fmt(struct trace_iterator *iter)
> >
> > entry = iter->ent;
> >
> > - SEQ_PUT_FIELD_RET(s, entry->pid);
> > - SEQ_PUT_FIELD_RET(s, entry->cpu);
> > - SEQ_PUT_FIELD_RET(s, iter->ts);
> > + if (trace_flags & TRACE_ITER_CONTEXT_INFO) {
> > + SEQ_PUT_FIELD_RET(s, entry->pid);
> > + SEQ_PUT_FIELD_RET(s, entry->cpu);
> > + SEQ_PUT_FIELD_RET(s, iter->ts);
> > + }
> >
> > event = ftrace_find_event(entry->type);
> > if (event && event->binary)
> > @@ -1643,7 +1564,7 @@ static enum print_line_t print_trace_line(struct trace_iterator *iter)
> > return print_raw_fmt(iter);
> >
> > if (iter->iter_flags & TRACE_FILE_LAT_FMT)
> > - return print_lat_fmt(iter, iter->idx, iter->cpu);
> > + return print_lat_fmt(iter);
> >
> > return print_trace_fmt(iter);
> > }
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index e603a29..f0c7a0f 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -405,6 +405,10 @@ void init_tracer_sysprof_debugfs(struct dentry *d_tracer);
> >
> > struct trace_entry *tracing_get_trace_entry(struct trace_array *tr,
> > struct trace_array_cpu *data);
> > +
> > +struct trace_entry *trace_find_next_entry(struct trace_iterator *iter,
> > + int *ent_cpu, u64 *ent_ts);
> > +
> > void tracing_generic_entry_update(struct trace_entry *entry,
> > unsigned long flags,
> > int pc);
> > @@ -591,7 +595,8 @@ enum trace_iterator_flags {
> > TRACE_ITER_ANNOTATE = 0x2000,
> > TRACE_ITER_USERSTACKTRACE = 0x4000,
> > TRACE_ITER_SYM_USEROBJ = 0x8000,
> > - TRACE_ITER_PRINTK_MSGONLY = 0x10000
> > + TRACE_ITER_PRINTK_MSGONLY = 0x10000,
> > + TRACE_ITER_CONTEXT_INFO = 0x20000 /* Print pid/cpu/time */
> > };
> >
> > /*
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index 1a4e144..a5752d4 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -286,6 +286,113 @@ seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags)
> > return ret;
> > }
> >
> > +static void
> > +lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
> > +{
> > + int hardirq, softirq;
> > + char *comm;
> > +
> > + comm = trace_find_cmdline(entry->pid);
> > +
> > + trace_seq_printf(s, "%8.8s-%-5d ", comm, entry->pid);
> > + trace_seq_printf(s, "%3d", cpu);
> > + trace_seq_printf(s, "%c%c",
> > + (entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
> > + (entry->flags & TRACE_FLAG_IRQS_NOSUPPORT) ? 'X' : '.',
> > + ((entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'N' : '.'));
> > +
> > + hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
> > + softirq = entry->flags & TRACE_FLAG_SOFTIRQ;
> > + if (hardirq && softirq) {
> > + trace_seq_putc(s, 'H');
> > + } else {
> > + if (hardirq) {
> > + trace_seq_putc(s, 'h');
> > + } else {
> > + if (softirq)
> > + trace_seq_putc(s, 's');
> > + else
> > + trace_seq_putc(s, '.');
> > + }
> > + }
> > +
> > + if (entry->preempt_count)
> > + trace_seq_printf(s, "%x", entry->preempt_count);
> > + else
> > + trace_seq_puts(s, ".");
> > +}
> > +
> > +static unsigned long preempt_mark_thresh = 100;
> > +
> > +static void
> > +lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
> > + unsigned long rel_usecs)
> > +{
> > + trace_seq_printf(s, " %4lldus", abs_usecs);
> > + if (rel_usecs > preempt_mark_thresh)
> > + trace_seq_puts(s, "!: ");
> > + else if (rel_usecs > 1)
> > + trace_seq_puts(s, "+: ");
> > + else
> > + trace_seq_puts(s, " : ");
> > +}
> > +
> > +int trace_print_context(struct trace_iterator *iter)
> > +{
> > + struct trace_seq *s = &iter->seq;
> > + struct trace_entry *entry = iter->ent;
> > + char *comm = trace_find_cmdline(entry->pid);
> > + unsigned long long t = ns2usecs(iter->ts);
> > + unsigned long usec_rem = do_div(t, USEC_PER_SEC);
> > + unsigned long secs = (unsigned long)t;
> > +
> > + if (!trace_seq_printf(s, "%16s-%-5d ", comm, entry->pid))
> > + goto partial;
> > + if (!trace_seq_printf(s, "[%03d] ", entry->cpu))
> > + goto partial;
> > + if (!trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem))
> > + goto partial;
> > +
> > + return 0;
> > +
> > +partial:
> > + return TRACE_TYPE_PARTIAL_LINE;
> > +}
> > +
> > +int trace_print_lat_context(struct trace_iterator *iter)
> > +{
> > + u64 next_ts;
> > + struct trace_seq *s = &iter->seq;
> > + struct trace_entry *entry = iter->ent,
> > + *next_entry = trace_find_next_entry(iter, NULL,
> > + &next_ts);
> > + unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
> > + unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
> > + unsigned long rel_usecs;
> > +
> > + if (!next_entry)
> > + next_ts = iter->ts;
> > + rel_usecs = ns2usecs(next_ts - iter->ts);
> > +
> > + if (verbose) {
> > + char *comm = trace_find_cmdline(entry->pid);
> > + trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08lx]"
> > + " %ld.%03ldms (+%ld.%03ldms): ",
> > + comm,
> > + entry->pid, entry->cpu, entry->flags,
> > + entry->preempt_count, iter->idx,
> > + ns2usecs(iter->ts),
> > + abs_usecs/1000,
> > + abs_usecs % 1000, rel_usecs/1000,
> > + rel_usecs % 1000);
> > + } else {
> > + lat_print_generic(s, entry, entry->cpu);
> > + lat_print_timestamp(s, abs_usecs, rel_usecs);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static const char state_to_char[] = TASK_STATE_TO_CHAR_STR;
> >
> > static int task_state_char(unsigned long state)
> > diff --git a/kernel/trace/trace_output.h b/kernel/trace/trace_output.h
> > index 1cbab5e..ec2ed90 100644
> > --- a/kernel/trace/trace_output.h
> > +++ b/kernel/trace/trace_output.h
> > @@ -33,6 +33,9 @@ int seq_print_userip_objs(const struct userstack_entry *entry,
> > int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> > unsigned long ip, unsigned long sym_flags);
> >
> > +int trace_print_context(struct trace_iterator *iter);
> > +int trace_print_lat_context(struct trace_iterator *iter);
> > +
> > struct trace_event *ftrace_find_event(int type);
> > int register_ftrace_event(struct trace_event *event);
> > int unregister_ftrace_event(struct trace_event *event);
> > --
> > 1.6.0.6
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/