Re: [PATCH v1 2/8] tracing/ftrace: guard syscall probe with preempt_notrace

From: Mathieu Desnoyers
Date: Fri Oct 04 2024 - 10:21:32 EST


On 2024-10-04 15:26, Steven Rostedt wrote:
On Thu, 3 Oct 2024 21:33:16 -0400
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

On 2024-10-04 03:04, Steven Rostedt wrote:
On Thu, 3 Oct 2024 20:26:29 -0400
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
{
struct trace_array *tr = data;
struct trace_event_file *trace_file;
struct syscall_trace_enter *entry;
struct syscall_metadata *sys_data;
struct trace_event_buffer fbuffer;
unsigned long args[6];
int syscall_nr;
int size;

syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;

/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */
trace_file = rcu_dereference_sched(tr->enter_syscall_files[syscall_nr]);

^^^^ this function explicitly states that preempt needs to be disabled by
tracepoints.

Ah, I should have known it was the syscall portion. I don't care for this
hidden dependency. I rather add a preempt disable here and not expect it to
be disabled when called.

Which is exactly what this patch is doing.

I was thinking of putting the protection in the function and not the macro.

I'm confused by your comment. The protection is added to the function here:

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 67ac5366f724..ab4db8c23f36 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -299,6 +299,12 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
int syscall_nr;
int size;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
@@ -338,6 +344,12 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
struct trace_event_buffer fbuffer;
int syscall_nr;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;

(I'll answer to the rest of your message in a separate email)

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com