Re: [PATCH] coredump debugging: add a tracepoint to report the coredumping

From: Mathieu Desnoyers
Date: Mon Feb 19 2024 - 13:10:06 EST


On 2024-02-19 12:28, Steven Rostedt wrote:
On Mon, 19 Feb 2024 18:00:38 +0100
Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

void __noreturn do_exit(long code)
{
struct task_struct *tsk = current;
int group_dead;

[...]
acct_collect(code, group_dead);
if (group_dead)
tty_audit_exit();
audit_free(tsk);

tsk->exit_code = code;
taskstats_exit(tsk, group_dead);

exit_mm();

if (group_dead)
acct_process();
trace_sched_process_exit(tsk);

There's a lot that happens before we trigger the above event.

and a lot after.

True. There really isn't a meaningful location here is there?

I actually use this tracepoint in my pid tracing.

The set_ftrace_pid and set_event_pid from /sys/kernel/tracing will add and
remove PIDs if the options function-fork or event-fork are set respectively.

I hook to the sched_process_fork tracepoint to add new PIDs if the parent
pid is already in one of the files, and remove a PID via the
sched_process_exit function.

No ? Those hook on sched_process_free, which is the actual point where the
task is freed (AFAIR after it's been a zombie and then waited for by another
task).

kernel/trace/trace_events.c:

void trace_event_follow_fork(struct trace_array *tr, bool enable)
{
if (enable) {
register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork,
tr, INT_MIN);
register_trace_prio_sched_process_free(event_filter_pid_sched_process_exit,
tr, INT_MAX);
} else {
unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork,
tr);
unregister_trace_sched_process_free(event_filter_pid_sched_process_exit,
tr);
}
}

kernel/trace/ftrace.c:

void ftrace_pid_follow_fork(struct trace_array *tr, bool enable)
{
if (enable) {
register_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
tr);
register_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
tr);
} else {
unregister_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
tr);
unregister_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
tr);
}
}

AFAIU, "sched_process_exit" is issued close to the point where the task exits
(it should not go back to userspace after that). "sched_process_free" is done
when the task is really being removed.

Between "sched_process_exit" and "sched_process_free", the task can still be
observed by a trace analysis looking at sched and signal events: it's a zombie at
that stage.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com