Re: [PATCH v2] tracing: Add sched_prepare_exec tracepoint

From: Google
Date: Thu Apr 11 2024 - 13:25:47 EST


On Thu, 11 Apr 2024 12:20:57 +0200
Marco Elver <elver@xxxxxxxxxx> wrote:

> Add "sched_prepare_exec" tracepoint, which is run right after the point
> of no return but before the current task assumes its new exec identity.
>
> Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec"
> tracepoint runs before flushing the old exec, i.e. while the task still
> has the original state (such as original MM), but when the new exec
> either succeeds or crashes (but never returns to the original exec).
>
> Being able to trace this event can be helpful in a number of use cases:
>
> * allowing tracing eBPF programs access to the original MM on exec,
> before current->mm is replaced;
> * counting exec in the original task (via perf event);
> * profiling flush time ("sched_prepare_exec" to "sched_process_exec").
>
> Example of tracing output:
>
> $ cat /sys/kernel/debug/tracing/trace_pipe
> <...>-379 [003] ..... 179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd
> <...>-381 [002] ..... 180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd
> <...>-385 [001] ..... 180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash
> <...>-389 [006] ..... 192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash
>
> Signed-off-by: Marco Elver <elver@xxxxxxxxxx>

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>

Thanks,

> ---
> v2:
> * Add more documentation.
> * Also show bprm->interp in trace.
> * Rename to sched_prepare_exec.
> ---
> fs/exec.c | 8 ++++++++
> include/trace/events/sched.h | 35 +++++++++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 38bf71cbdf5e..57fee729dd92 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1268,6 +1268,14 @@ int begin_new_exec(struct linux_binprm * bprm)
> if (retval)
> return retval;
>
> + /*
> + * This tracepoint marks the point before flushing the old exec where
> + * the current task is still unchanged, but errors are fatal (point of
> + * no return). The later "sched_process_exec" tracepoint is called after
> + * the current task has successfully switched to the new exec.
> + */
> + trace_sched_prepare_exec(current, bprm);
> +
> /*
> * Ensure all future errors are fatal.
> */
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index dbb01b4b7451..226f47c6939c 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -420,6 +420,41 @@ TRACE_EVENT(sched_process_exec,
> __entry->pid, __entry->old_pid)
> );
>
> +/**
> + * sched_prepare_exec - called before setting up new exec
> + * @task: pointer to the current task
> + * @bprm: pointer to linux_binprm used for new exec
> + *
> + * Called before flushing the old exec, where @task is still unchanged, but at
> + * the point of no return during switching to the new exec. At the point it is
> + * called the exec will either succeed, or on failure terminate the task. Also
> + * see the "sched_process_exec" tracepoint, which is called right after @task
> + * has successfully switched to the new exec.
> + */
> +TRACE_EVENT(sched_prepare_exec,
> +
> + TP_PROTO(struct task_struct *task, struct linux_binprm *bprm),
> +
> + TP_ARGS(task, bprm),
> +
> + TP_STRUCT__entry(
> + __string( interp, bprm->interp )
> + __string( filename, bprm->filename )
> + __field( pid_t, pid )
> + __string( comm, task->comm )
> + ),
> +
> + TP_fast_assign(
> + __assign_str(interp, bprm->interp);
> + __assign_str(filename, bprm->filename);
> + __entry->pid = task->pid;
> + __assign_str(comm, task->comm);
> + ),
> +
> + TP_printk("interp=%s filename=%s pid=%d comm=%s",
> + __get_str(interp), __get_str(filename),
> + __entry->pid, __get_str(comm))
> +);
>
> #ifdef CONFIG_SCHEDSTATS
> #define DEFINE_EVENT_SCHEDSTAT DEFINE_EVENT
> --
> 2.44.0.478.gd926399ef9-goog
>


--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>