Re: [PATCH] coredump debugging: add a tracepoint to report the coredumping
From: Mathieu Desnoyers
Date: Mon Feb 19 2024 - 13:10:06 EST
On 2024-02-19 12:28, Steven Rostedt wrote:
On Mon, 19 Feb 2024 18:00:38 +0100
Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
void __noreturn do_exit(long code)
{
struct task_struct *tsk = current;
int group_dead;
[...]
acct_collect(code, group_dead);
if (group_dead)
tty_audit_exit();
audit_free(tsk);
tsk->exit_code = code;
taskstats_exit(tsk, group_dead);
exit_mm();
if (group_dead)
acct_process();
trace_sched_process_exit(tsk);
There's a lot that happens before we trigger the above event.
and a lot after.
True. There really isn't a meaningful location here is there?
I actually use this tracepoint in my pid tracing.
The set_ftrace_pid and set_event_pid from /sys/kernel/tracing will add and
remove PIDs if the options function-fork or event-fork are set respectively.
I hook to the sched_process_fork tracepoint to add new PIDs if the parent
pid is already in one of the files, and remove a PID via the
sched_process_exit function.
No ? Those hook on sched_process_free, which is the actual point where the
task is freed (AFAIR after it's been a zombie and then waited for by another
task).
kernel/trace/trace_events.c:
void trace_event_follow_fork(struct trace_array *tr, bool enable)
{
if (enable) {
register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork,
tr, INT_MIN);
register_trace_prio_sched_process_free(event_filter_pid_sched_process_exit,
tr, INT_MAX);
} else {
unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork,
tr);
unregister_trace_sched_process_free(event_filter_pid_sched_process_exit,
tr);
}
}
kernel/trace/ftrace.c:
void ftrace_pid_follow_fork(struct trace_array *tr, bool enable)
{
if (enable) {
register_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
tr);
register_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
tr);
} else {
unregister_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
tr);
unregister_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
tr);
}
}
AFAIU, "sched_process_exit" is issued close to the point where the task exits
(it should not go back to userspace after that). "sched_process_free" is done
when the task is really being removed.
Between "sched_process_exit" and "sched_process_free", the task can still be
observed by a trace analysis looking at sched and signal events: it's a zombie at
that stage.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com