Re: [PATCH] tracing: Have format file honor EVENT_FILE_FL_FREED

From: Mathias Krause
Date: Fri Jul 26 2024 - 06:16:35 EST


On 26.07.24 02:15, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@xxxxxxxxxxx>
>
> When eventfs was introduced, special care had to be done to coordinate the
> freeing of the file meta data with the files that are exposed to user
> space. The file meta data would have a ref count that is set when the file
> is created and would be decremented and freed after the last user that
> opened the file closed it. When the file meta data was to be freed, it
> would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed,
> and any new references made (like new opens or reads) would fail as it is
> marked freed. This allowed other meta data to be freed after this flag was
> set (under the event_mutex).
>
> All the files that were dynamically created in the events directory had a
> pointer to the file meta data and would call event_release() when the last
> reference to the user space file was closed. This would be the time that it
> is safe to free the file meta data.
>
> A short cut was made for the "format" file. It's i_private would point to
> the "call" entry directly and not point to the file's meta data. This is
> because all format files are the same for the same "call", so it was
> thought there was no reason to differentiate them. The other files
> maintain state (like the "enable", "trigger", etc). But this meant if the
> file were to disappear, the "format" file would be unaware of it.
>
> This fixes two bugs in the same code. One is a race that could be trigger
> via the user_events test (that would create dynamic events and free them),
> and running a loop that would read the user_events format files:
>
> In one console run:
>
> # cd tools/testing/selftests/user_events
> # while true; do ./ftrace_test; done
>
> And in another console run:
>
> # cd /sys/kernel/tracing/
> # while true; do cat events/user_events/__test_event/format; done 2>/dev/null
>
> With KASAN memory checking, it would trigger a use-after-free bug. This was

The UAF bug is there even without KASAN. It's just that KASAN makes it
much easier to detect and catch early.

> because the format file was not checking the file's meta data flag
> "EVENT_FILE_FL_FREED", so it would access the event that the file meta data
> pointed to after it was freed.
>
> The second bug is that the dynamic "format" file also registered a callback
> to decrement the meta data, but the "data" pointer passed to the callback
> was the event itself. Not the meta data to free. This would either cause a
> memory leak (the meta data never was freed) or a crash as it could have
> incorrectly freed the event itself.
>
> Link: https://lore.kernel.org/all/20240719204701.1605950-1-minipli@xxxxxxxxxxxxxx/
>
> Cc: stable@xxxxxxxxxxxxxxx
> Reported-by: Mathias Krause <minipli@xxxxxxxxxxxxxx>
> Fixes: b63db58e2fa5d ("eventfs/tracing: Add callback for release of an eventfs_inode")

That fixes tag looks odd as it didn't introduce the bug. It's some late
change to v6.9 but my bisect run showed, it's triggering as early as in
v6.6 (commit 27152bceea1d ("eventfs: Move tracing/events to eventfs")).

git blame points to 5790b1fb3d67 ("eventfs: Remove eventfs_file and just
use eventfs_inode"), which is still too young, as it's v6.7.

IMHO, this needs at least the following additional fixes tags to ensure
all stable kernels get covered:

Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use
eventfs_inode")
Fixes: 27152bceea1d ("eventfs: Move tracing/events to eventfs")

Even if 27152bceea1d is not the real cause, just the commit making the
bug reachable. But from looking at the history, this was always wrong?

> Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
> ---
> kernel/trace/trace_events.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 6ef29eba90ce..852643d957de 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -1540,7 +1540,8 @@ enum {
>
> static void *f_next(struct seq_file *m, void *v, loff_t *pos)
> {
> - struct trace_event_call *call = event_file_data(m->private);
> + struct trace_event_file *file = event_file_data(m->private);
> + struct trace_event_call *call = file->event_call;
> struct list_head *common_head = &ftrace_common_fields;
> struct list_head *head = trace_get_fields(call);
> struct list_head *node = v;
> @@ -1572,7 +1573,8 @@ static void *f_next(struct seq_file *m, void *v, loff_t *pos)
>
> static int f_show(struct seq_file *m, void *v)
> {
> - struct trace_event_call *call = event_file_data(m->private);
> + struct trace_event_file *file = event_file_data(m->private);
> + struct trace_event_call *call = file->event_call;
> struct ftrace_event_field *field;
> const char *array_descriptor;
>
> @@ -1627,12 +1629,14 @@ static int f_show(struct seq_file *m, void *v)
>
> static void *f_start(struct seq_file *m, loff_t *pos)
> {
> + struct trace_event_file *file;
> void *p = (void *)FORMAT_HEADER;
> loff_t l = 0;
>
> /* ->stop() is called even if ->start() fails */
> mutex_lock(&event_mutex);
> - if (!event_file_data(m->private))
> + file = event_file_data(m->private);
> + if (!file || (file->flags & EVENT_FILE_FL_FREED))
> return ERR_PTR(-ENODEV);
>
> while (l < *pos && p)
> @@ -2485,7 +2489,6 @@ static int event_callback(const char *name, umode_t *mode, void **data,
> if (strcmp(name, "format") == 0) {
> *mode = TRACE_MODE_READ;
> *fops = &ftrace_event_format_fops;
> - *data = call;
> return 1;
> }
>

Tested-by: Mathias Krause <minipli@xxxxxxxxxxxxxx>

Thanks,
Mathias