Re: [PATCH 2/6] tracing: Add trace event error log

From: Masami Hiramatsu
Date: Sat Apr 14 2018 - 02:31:17 EST


On Thu, 12 Apr 2018 18:52:13 -0500
Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx> wrote:

> Hi Steve,
>
> On Thu, 2018-04-12 at 18:20 -0400, Steven Rostedt wrote:
> > On Thu, 12 Apr 2018 10:13:17 -0500
> > Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx> wrote:
> >
> > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > > index 6fb46a0..f2dc7e6 100644
> > > --- a/kernel/trace/trace.h
> > > +++ b/kernel/trace/trace.h
> > > @@ -1765,6 +1765,9 @@ extern ssize_t trace_parse_run_command(struct file *file,
> > > const char __user *buffer, size_t count, loff_t *ppos,
> > > int (*createfn)(int, char**));
> > >
> > > +extern void event_log_err(const char *loc, const char *cmd, const char *fmt,
> > > + ...);
> > > +
> > > /*
> > > * Normal trace_printk() and friends allocates special buffers
> > > * to do the manipulation, as well as saves the print formats
> > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > > index 05c7172..fd02e22 100644
> > > --- a/kernel/trace/trace_events.c
> > > +++ b/kernel/trace/trace_events.c
> > > @@ -1668,6 +1668,164 @@ static void ignore_task_cpu(void *data)
> > > return ret;
> > > }
> > >
> > > +#define EVENT_LOG_ERRS_MAX (PAGE_SIZE / sizeof(struct event_log_err))
> >
> > > +#define EVENT_ERR_LOG_MASK (EVENT_LOG_ERRS_MAX - 1)
> >
> > BTW, the above only works if EVENT_LOG_ERRS_MAX is a power of two,
> > which it's not guaranteed to be.
> >
>
> My assumption was that we'd only ever need a page or two for the
> error_log and so would always would be a power of two, since the size of
> the struct event_log_err is 512.
>
> Anyway, I should probably have put comments about all this in the code,
> and I will, but the way it works kind of assumes a very small number of
> errors - it's replacing a simple 'last error' facility for the hist
> triggers and making it a common facility for other things that have
> similar needs like Masami's kprobe_events errors. For those purposes, I
> assumed it would suffice to simply be able to show that last 8 or some
> similar small number of errors and constantly recycle the slots.

Correct. I don't think the error log entry size over 16, it is too much
errors occur. User must check it before that. Or, we should push it
printk buffer.

>
> Basically it just splits the page into 16 strings, 2 per error, one for
> the actual error text, the other for the command the user entered. The
> struct event_log_err just overlays a struct on top of 2 strings just to
> make it easier to manage.
>
> Anyway, because it is such a small number, and we start with a zeroed
> page, whenever we print the error log, we print all 16 strings even if
> we only have one error (2 strings). The rest are NULL and print
> nothing. We start with the tail, which could also be thought of as the
> 'oldest' or the 'first' error in the buffer and just cycle through them
> all. Hope that clears up some of the other questions you had about how
> a non-full log gets printed, etc...
>
> > > +
> > > +struct event_log_err {
> > > + char err[MAX_FILTER_STR_VAL];
> > > + char cmd[MAX_FILTER_STR_VAL];
> > > +};
> >
> > I like the event_log_err idea, but the above can be shrunk to:
> >
> > struct err_info {
> > u8 type; /* I can only imagine 254 types */
> > u8 pos; /* MAX_FILTER_STR_VAR = 256 */
> > };
> >
> > struct event_log_err {
> > struct err_info info;
> > char cmd[MAX_FILTER_STR_VAL];
> > };
> >
> > There's no reason to put in a bunch of text that's going to be static
> > anyway. Have a lookup table like we do for filters.
> >
> > + log_err("Variable name not unique, need to use fully qualified name (%s) for variable: ", fqvar(system, event_name, var_name, true));
> >
>
> Hmm, most of the log_errs use printf strings that get expanded, so need
> a destination buffer, the event_log_err->err string, but I think I see
> what you're getting at - that we can get rid of the format strings
> altogether and make them static strings if we use the method of simply
> printing the static string and putting a caret where the error is as
> below.
>
> >
> > Instead of making the fqvar, find the location of the variable, and add:
> >
> > blah blah $var blah blah
> > ^
> > Variable name not unique, need to use fully qualified name for variable
> >
>
> OK, if we can do this for every error type, then we can use the lookup
> table and the caret position instead of format strings. Which means
> getting rid of the simple ring of strings, but whatever..

>From the viewpoint of kprobe events, I'm OK for either way.
I'll rewrite parser to get parsing position correctly.

Thanks!


--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>