Re: [PATCH] exit: add trace_task_exit() tracepoint before current->mm is reset
From: Steven Rostedt
Date: Tue Apr 01 2025 - 17:32:01 EST
On Tue, 1 Apr 2025 11:40:21 -0700
Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
Hi Andrii,
> It is useful to be able to access current->mm to, say, record a bunch of
> VMA information right before the task exits (e.g., for stack
> symbolization reasons when dealing with short-lived processes that exit
> in the middle of profiling session). We currently do have
> trace_sched_process_exit() in the exit path, but it is called a bit too
> late, after exit_mm() resets current->mm to NULL, which makes it
> unsuitable for inspecting and recording task's mm_struct-related data
> when tracing process lifetimes.
My fear of adding another task exit trace event is that it will get a
bit confusing as that we now have trace_sched_process_exit() and also
trace_task_exit() with slightly different semantics.
How about adding a trace_exit_mm()? Add that to the exit_mm() code?
static void exit_mm(void)
{
struct mm_struct *mm = current->mm;
exit_mm_release(current, mm);
trace_exit_mm(mm);
??
>
> There is a particularly suitable place, though, right after
> taskstats_exit() is called, but before we do exit_mm(). taskstats
> performs a similar kind of accounting that some applications do with
> BPF, and so co-locating them seems like a good fit.
>
> Moving trace_sched_process_exit() a bit earlier would solve this problem
> as well, and I'm open to that. But this might potentially change its
> semantics a little, and so instead of risking that, I went for adding
> a new trace_task_exit() tracepoint instead. Tracepoints have zero
> overhead at runtime, unless actively traced, so this seems acceptable.
>
> Also, existing trace_sched_process_exit() tracepoint is notoriously
> missing `group_dead` flag that is certainly useful in practice and some
> of our production applications have to work around this. So plumb
> `group_dead` through while at it, to have a richer and more complete
> tracepoint.
There should be no problem adding group_dead to the
trace_sched_process_exit() trace event. Adding fields should never cause
any user API breakage.
-- Steve