Re: [RFC PATCH v3 3/3] tracing: Fix syscall tracepoint use-after-free
From: Andrii Nakryiko
Date: Sun Oct 27 2024 - 21:23:17 EST
On Sat, Oct 26, 2024 at 8:48 AM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> The grace period used internally within tracepoint.c:release_probes()
> uses call_rcu() to batch waiting for quiescence of old probe arrays,
> rather than using the tracepoint_synchronize_unregister() which blocks
> while waiting for quiescence.
>
> With the introduction of faultable syscall tracepoints, this causes
> use-after-free issues reproduced with syzkaller.
>
> Fix this by using the appropriate call_rcu() or call_rcu_tasks_trace()
> before invoking the rcu_free_old_probes callback. This can be chosen
> using the tracepoint_is_syscall() API.
>
> A similar issue exists in bpf use of call_rcu(). Fixing this is left to
> a separate change.
>
> Reported-by: syzbot+b390c8062d8387b6272a@xxxxxxxxxxxxxxxxxxxxxxxxx
> Fixes: a363d27cdbc2 ("tracing: Allow system call tracepoints to handle page faults")
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> Cc: Michael Jeanson <mjeanson@xxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> Cc: Yonghong Song <yhs@xxxxxx>
> Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
> Cc: bpf@xxxxxxxxxxxxxxx
> Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> Cc: Jordan Rife <jrife@xxxxxxxxxx>
> ---
> Changes since v0:
> - Introduce tracepoint_call_rcu(),
> - Fix bpf_link_free() use of call_rcu as well.
>
> Changes since v1:
> - Use tracepoint_call_rcu() for bpf_prog_put as well.
>
> Changes since v2:
> - Do not cover bpf changes in the same commit, let bpf developers
> implement it.
> ---
> kernel/tracepoint.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 5658dc92f5b5..47569fb06596 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -106,13 +106,16 @@ static void rcu_free_old_probes(struct rcu_head *head)
> kfree(container_of(head, struct tp_probes, rcu));
> }
>
> -static inline void release_probes(struct tracepoint_func *old)
> +static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old)
> {
> if (old) {
> struct tp_probes *tp_probes = container_of(old,
> struct tp_probes, probes[0]);
>
> - call_rcu(&tp_probes->rcu, rcu_free_old_probes);
> + if (tracepoint_is_syscall(tp))
> + call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes);
should this be call_rcu_tasks_trace() -> call_rcu() chain instead of
just call_rcu_tasks_trace()? While currently call_rcu_tasks_trace()
implies RCU GP (as evidenced by rcu_trace_implies_rcu_gp() being
hardcoded right now to returning true), this might not always be the
case in the future, so it's best to have a guarantee that regardless
of sleepable or not, we'll always have have RCU GP, and for sleepable
tracepoint *also* RCU Tasks Trace GP.
> + else
> + call_rcu(&tp_probes->rcu, rcu_free_old_probes);
> }
> }
>
> @@ -334,7 +337,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
> break;
> }
>
> - release_probes(old);
> + release_probes(tp, old);
> return 0;
> }
>
> @@ -405,7 +408,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> WARN_ON_ONCE(1);
> break;
> }
> - release_probes(old);
> + release_probes(tp, old);
> return 0;
> }
>
> --
> 2.39.5
>