Re: [PATCH -mm v6] tracepoint: Add signal coredump tracepoint

From: Roland McGrath
Date: Fri Apr 23 2010 - 20:56:16 EST


> > The "retval" encoding seems a bit arcane. I wonder if it might be better
> > to just have separate tracepoints for successful and failed dump attempts.
> > Note that whether or not the dump succeeded is already available in
> > (task->signal->group_exit_code & 0x80) as seen at exit or death tracing
> > events.
>
> OK, please read the previous discussion and tell me what you think about...

I don't have a strong opinion about this particular aspect.

> > The purposes you mention seem to be served well enough by this tracepoint.
> > But I recall having the impression that one of the original motivating
> > interests for tracing core-dump details was to understand when a giant core
> > dump was responsible for huge amounts of i/o and/or memory thrashing.
> > (Once you notice that happening, you might adjust coredump_filter settings
> > to reduce the problem.) Your new tracepoint doesn't help directly with
> > tracking those sorts of issues, because it only happens after all the work
> > is done. If you are monitoring trace_signal_deliver, then you can filter
> > those for SIG_DFL cases of sig_kernel_coredump() signals and recognize that
> > as the beginning of the coredump. Still, it might be preferable to have
> > explicit start-core-dump and end-core-dump tracepoints.
>
> No, that's not our interests. We are just interested in recording
> coredump parameters when the coredump is done. After dumping core
> (or failing to dump), we'd like to check what was wrong (wrong filter setting?),
> or correctly dumped (and removed after).
>
> > Furthermore, I can see potential use for tracepoints before and after
> > coredump_wait(), which synchronizes other threads before actually starting
> > to calculate and write the dump. The window after coredump_wait() and
> > before the post-dump tracepoint is where the actual work of writing the
> > core file takes place, in case you want to monitor i/o load between those
> > marks or suchlike.
>
> Hmm, currently, we don't mention that.

I know that's not your interest now. But I do recall someone somewhere at
some point asking about tracepoints in the context of observing unexpected
i/o and paging loads and tracking down that they were due to core dumping.
It's also certainly the case that there have been people spending time
diagnosing long delays due to coredump_wait() logic and/or actual deadlocks
due to bugs in the exit path interactions with coredump_wait(). Having
separate before-wait, after-wait, and after-dump tracepoints could well
make that sort of diagnosis trivial in the future, especially when doing
"flight recorder" style analysis.


Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/