Re: tracing: user events UAF crash report

From: Ajay Kaher
Date: Mon Jul 22 2024 - 07:14:30 EST


On Sat, Jul 20, 2024 at 2:17 AM Mathias Krause <minipli@xxxxxxxxxxxxxx> wrote:
>
> Hi Steven, Ajay,
>
> [ @Cc list: I found out issues with tracefs have been reported /
> attempted to get fixed in the past, so you may be interested. ]
>
> I noticed, the user events ftrace selftest is crashing every now and
> then in our automated tests. Digging into, I found that the following
> is triggering the issue very reliable:
>
> - in one shell, as root:
> # while true; do ./kselftest/user_events/ftrace_test; done
>
> - in a second shell, again as root:
> # cd /sys/kernel/tracing
> # while true; do cat events/user_events/__test_event/format; done 2>/dev/null
>

Tried to reproduced on 6.10.0-rc7-100.ph5+, only getting repeated output as:

# while true; do cat events/user_events/__test_event/format; done 2>/dev/null
print fmt: ""
name: __test_event
ID: 2390
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

# while true; do ./kselftest/user_events/ftrace_test; done
TAP version 13
1..6
# Starting 6 tests from 1 test cases.
# RUN user.register_events ...
# OK user.register_events
ok 1 user.register_events
# RUN user.write_events ...
# OK user.write_events
ok 2 user.write_events
# RUN user.write_empty_events ...
# OK user.write_empty_events
ok 3 user.write_empty_events
# RUN user.write_fault ...
# OK user.write_fault
ok 4 user.write_fault
# RUN user.write_validator ...
# OK user.write_validator
ok 5 user.write_validator
# RUN user.print_fmt ...
# OK user.print_fmt
ok 6 user.print_fmt
# PASSED: 6 / 6 tests passed.
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0

Am I missing any step?

> Ignoring that the selftest fails for half of its tests -- which is a
> regression and therefore yet another bug, I guess -- this triggers an
> access fault (GFP/PF/NULL deref) after a few iterations, usually within
> a minute, mostly mere seconds. With KASAN enabled it generates a splat
> almost instantly, like following:
>
> [ 23.790955] ==================================================================
> [ 23.791692] BUG: KASAN: slab-use-after-free in f_show+0x43b/0x470
> [ 23.792244] Read of size 8 at addr ffff888007076878 by task cat/755
> [ 23.792753]
> [ 23.792947] CPU: 1 PID: 755 Comm: cat Tainted: G D N 6.10.0-vanilla-dirty #30
> [ 23.793563] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [ 23.794280] Call Trace:
> [ 23.794470] <TASK>
> [ 23.794642] dump_stack_lvl+0x66/0xa0
> [ 23.794909] print_report+0xd0/0x630
> [ 23.795180] ? f_show+0x43b/0x470
> [ 23.795415] ? __virt_addr_valid+0x208/0x3f0
> [ 23.795700] ? f_show+0x43b/0x470
> [ 23.795935] kasan_report+0xd8/0x110
> [ 23.796183] ? f_show+0x43b/0x470
> [ 23.796418] f_show+0x43b/0x470
> [ 23.796645] seq_read_iter+0x418/0x11e0
> [ 23.796907] ? __page_table_check_ptes_set+0x166/0x1a0
> [ 23.797280] seq_read+0x261/0x350
> [ 23.797537] ? lock_release+0x453/0x600
> [ 23.797819] ? __pfx_seq_read+0x10/0x10
> [ 23.798103] ? set_ptes.isra.0+0x11b/0x150
> [ 23.798402] vfs_read+0x171/0x9e0
> [ 23.798657] ? lock_acquire+0x408/0x4b0
> [ 23.798970] ? __pfx_vfs_read+0x10/0x10
> [ 23.799267] ? lock_release+0x453/0x600
> [ 23.799549] ? do_user_addr_fault+0x45b/0x8b0
> [ 23.799862] ? __pfx_lock_release+0x10/0x10
> [ 23.800162] ? __pfx___up_read+0x10/0x10
> [ 23.800448] ksys_read+0xdd/0x1a0
> [ 23.800704] ? __pfx_ksys_read+0x10/0x10
> [ 23.801007] ? do_user_addr_fault+0x465/0x8b0
> [ 23.801341] do_syscall_64+0x66/0x130
> [ 23.801615] entry_SYSCALL_64_after_hwframe+0x71/0x79
> [ 23.801959] RIP: 0033:0x7f1fdb9df40e
> [ 23.802230] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 2e 08 0b 00 e8 69 01 02 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
> [ 23.803352] RSP: 002b:00007fff86d62658 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [ 23.803848] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f1fdb9df40e
> [ 23.804302] RDX: 0000000000020000 RSI: 00007f1fdb8d3000 RDI: 0000000000000003
> [ 23.804753] RBP: 00007f1fdb8d3000 R08: 00007f1fdb8d2010 R09: 0000000000000000
> [ 23.805239] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
> [ 23.805694] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
> [ 23.806153] </TASK>
> [ 23.806352]
> [ 23.806518] Allocated by task 753:
> [ 23.806781] kasan_save_stack+0x20/0x40
> [ 23.807094] kasan_save_track+0x14/0x30
> [ 23.807377] __kasan_kmalloc+0x8f/0xa0
> [ 23.807656] user_event_parse_cmd+0x590/0x25e0
> [ 23.807969] user_events_ioctl+0xa52/0x17f0
> [ 23.808267] __x64_sys_ioctl+0x133/0x190
> [ 23.808555] do_syscall_64+0x66/0x130
> [ 23.808830] entry_SYSCALL_64_after_hwframe+0x71/0x79
> [ 23.809208]
> [ 23.809370] Freed by task 58:
> [ 23.809609] kasan_save_stack+0x20/0x40
> [ 23.809890] kasan_save_track+0x14/0x30
> [ 23.810173] kasan_save_free_info+0x3b/0x60
> [ 23.810473] poison_slab_object+0x10a/0x170
> [ 23.810771] __kasan_slab_free+0x14/0x30
> [ 23.811090] kfree+0xe0/0x2f0
> [ 23.811338] destroy_user_event+0x305/0x450
> [ 23.811639] delayed_destroy_user_event+0x5c/0xe0
> [ 23.811965] process_one_work+0x81c/0x1970
> [ 23.812259] worker_thread+0x608/0x1160
> [ 23.812542] kthread+0x2be/0x3b0
> [ 23.812794] ret_from_fork+0x2c/0x70
> [ 23.813098] ret_from_fork_asm+0x1a/0x30
> [ 23.813387]
> [ 23.813551] Last potentially related work creation:
> [ 23.813885] kasan_save_stack+0x20/0x40
> [ 23.814168] __kasan_record_aux_stack+0x8e/0xa0
> [ 23.814489] insert_work+0x20/0x1b0
> [ 23.814756] __queue_work+0x67a/0xc60
> [ 23.815079] queue_work_on+0x63/0x90
> [ 23.815350] user_event_put+0x1f9/0x390
> [ 23.815631] user_events_ioctl+0x11ed/0x17f0
> [ 23.815935] __x64_sys_ioctl+0x133/0x190
> [ 23.816223] do_syscall_64+0x66/0x130
> [ 23.816498] entry_SYSCALL_64_after_hwframe+0x71/0x79
> [ 23.816841]
> [ 23.817019] The buggy address belongs to the object at ffff888007076800
> [ 23.817019] which belongs to the cache kmalloc-cg-512 of size 512
> [ 23.817799] The buggy address is located 120 bytes inside of
> [ 23.817799] freed 512-byte region [ffff888007076800, ffff888007076a00)
> [ 23.818530]
> [ 23.818694] The buggy address belongs to the physical page:
> [ 23.819092] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7074
> [ 23.819602] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [ 23.820323] memcg:ffff88800277fd01
> [ 23.820595] anon flags: 0x40(head|zone=0)
> [ 23.820885] page_type: 0xffffefff(slab)
> [ 23.821203] raw: 0000000000000040 ffff88800014cdc0 0000000000000000 0000000000000001
> [ 23.821706] raw: 0000000000000000 0000000080100010 00000001ffffefff ffff88800277fd01
> [ 23.822206] head: 0000000000000040 ffff88800014cdc0 0000000000000000 0000000000000001
> [ 23.822711] head: 0000000000000000 0000000080100010 00000001ffffefff ffff88800277fd01
> [ 23.823261] head: 0000000000000002 ffffea00001c1d01 ffffffffffffffff 0000000000000000
> [ 23.823764] head: ffff888000000004 0000000000000000 00000000ffffffff 0000000000000000
> [ 23.824271] page dumped because: kasan: bad access detected
> [ 23.824638]
> [ 23.824799] Memory state around the buggy address:
> [ 23.825162] ffff888007076700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 23.825641] ffff888007076780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 23.826116] >ffff888007076800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 23.826588] ^
> [ 23.827070] ffff888007076880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 23.827545] ffff888007076900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 23.828018] ==================================================================
>
> I did a bisect run and ended up on commit 27152bceea1d ("eventfs: Move
> tracing/events to eventfs"). This is, of cause, only the enabling
> commit. The bug is likely in one of the preceeding commits introducing
> eventfs infrastructure.
>
> The git bisect log (I jumped around a little in the beginning, but yeah,
> v6.10 is still broken):
>
> git bisect start
> # status: waiting for both good and bad commits
> # bad: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
> git bisect bad a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
> # status: waiting for good commit(s), bad commit known
> # bad: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10
> git bisect bad 0c3836482481200ead7b416ca80c68a29cfdaabd
> # status: waiting for good commit(s), bad commit known
> # bad: [ffc253263a1375a65fa6c9f62a893e9767fbebfa] Linux 6.6
> git bisect bad ffc253263a1375a65fa6c9f62a893e9767fbebfa
> # status: waiting for good commit(s), bad commit known
> # good: [6995e2de6891c724bfeb2db33d7b87775f913ad1] Linux 6.4
> git bisect good 6995e2de6891c724bfeb2db33d7b87775f913ad1
> # good: [692f5510159c79bfa312a4e27a15e266232bfb4c] Merge tag 'asoc-v6.6' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
> git bisect good 692f5510159c79bfa312a4e27a15e266232bfb4c
> # good: [0e72db77672ff4758a31fb5259c754a7bb229751] Merge tag 'soc-dt-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect good 0e72db77672ff4758a31fb5259c754a7bb229751
> # bad: [5eea5820c7340d39e56e169e1b87199391105f6b] Merge tag 'mm-stable-2023-09-04-14-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad 5eea5820c7340d39e56e169e1b87199391105f6b
> # good: [28a4f91f5f251689c69155bc6a0b1afc9916c874] Merge tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> git bisect good 28a4f91f5f251689c69155bc6a0b1afc9916c874
> # bad: [f7e97ce26972ae7be8bbbae8d819ff311d4c5900] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
> git bisect bad f7e97ce26972ae7be8bbbae8d819ff311d4c5900
> # good: [9a5d660fdb25d20748d7f9e9559c86073c3bb368] media: ivsc: Add ACPI dependency
> git bisect good 9a5d660fdb25d20748d7f9e9559c86073c3bb368
> # good: [704e2c6107f1a5353a1038bac137dda0df2a6dd0] Merge tag 'icc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next
> git bisect good 704e2c6107f1a5353a1038bac137dda0df2a6dd0
> # good: [bd30fe6a7d9b72e73c5ac9109cbc3066dde08034] Merge tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
> git bisect good bd30fe6a7d9b72e73c5ac9109cbc3066dde08034
> # good: [d952f54d01ec2ea5ee9d5e21f2ea3a5807b4bcbc] RDMA/hns: Remove unused declaration hns_roce_modify_srq()
> git bisect good d952f54d01ec2ea5ee9d5e21f2ea3a5807b4bcbc
> # bad: [34232fcfe9a383bea802af682baae5c99f22376c] Merge tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
> git bisect bad 34232fcfe9a383bea802af682baae5c99f22376c
> # bad: [7c1130ea5cae215588e6d51242b877fd405e6c32] test: ftrace: Fix kprobe test for eventfs
> git bisect bad 7c1130ea5cae215588e6d51242b877fd405e6c32
> # good: [5bdcd5f5331a276a3ddf1fba8605986d0d15298a] eventfs: Implement removal of meta data from eventfs
> git bisect good 5bdcd5f5331a276a3ddf1fba8605986d0d15298a
> # bad: [27152bceea1df27ffebb12ac9cd9adbf2c4c3f35] eventfs: Move tracing/events to eventfs
> git bisect bad 27152bceea1df27ffebb12ac9cd9adbf2c4c3f35
> # first bad commit: [27152bceea1df27ffebb12ac9cd9adbf2c4c3f35] eventfs: Move tracing/events to eventfs
>
> I stared quite hard at the inode/dentry/file handling and, imho, it's
> quite broken as it leaks pointers to refcounted objects which may go
> away anytime, e.g. &user->call via trace_add_event_call() in
> user_event_set_call_visible(). That'll lead to a user reachable file
> hierarchy below events/user_events/$eventname/ that isn't bound to the
> creator's mm.
>
> Assume one process opens events/user_events/$eventname/format, then the
> walked fields -- the whole object, actually -- may go away anytime, once
> the creator either removes the user probe again or closes its fd to
> /sys/kernel/tracing/user_events_data, leading to user_events_release()
> doing the final user_event_put() leading to object destruction. The
> concurrent reader of events/user_events/$eventname/format will now
> operate on a free'd object. Boom!
>
> As mentioned, I stared quite hard at the code and tried to fix it via
> numerous ways but the core issue is, imho, the lack of refcounting the
> &user->call object, or even better, 'user', for the files created below
> events/user_events/$eventname/. However, I found no easy way to do that.
>
> My hopes are, you may be able to, now that there's an easy reproducer.
>
> Beside the obvious bug, I noticed the following (not fixing the issue,
> tho):
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 5d88c184f0fc..687ad0a26458 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -112,7 +112,7 @@ static void release_ei(struct kref *ref)
> entry->release(entry->name, ei->data);
> }
>
> - call_rcu(&ei->rcu, free_ei_rcu);
> + call_srcu(&eventfs_srcu, &ei->rcu, free_ei_rcu);
> }
>
> static inline void put_ei(struct eventfs_inode *ei)
> @@ -735,7 +735,9 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
>
> /* Was the parent freed? */
> if (list_empty(&ei->list)) {
> + mutex_lock(&eventfs_mutex);
> cleanup_ei(ei);
> + mutex_unlock(&eventfs_mutex);
> ei = NULL;
> }
> return ei;
>
> The s/call_rcu/call_srcu/ one is because not only do the comments talk
> about SRCU protection, the code does so as well, e.g. eventfs_iterate().
>
> Similar argumentation for taking the mutex around cleanup_ei(). It sets
> the 'is_freed' member which others expect only to be set -- or at least
> to have a stable value -- when the mutex is hold. As 'ei' was exposed to
> its parent before, it may have escaped to other users, e.g. via
> eventfs_iterate().
>
> I may have missed something, painting these changes completely void,
> therefore spared me the time to do proper patches. However, below one is
> legit ("git am --scissors ..." yada yada):
>
> ---8<---
> Subject: [PATCH] eventfs: Don't return NULL in eventfs_create_dir()
>
> Commit 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing
> ei->dentry") added another check, testing if the parent was freed after
> we released the mutex. If so, the function returns NULL. However, all
> callers expect it to either return a valid pointer or an error pointer,
> at least since commit 5264a2f4bb3b ("tracing: Fix a NULL vs IS_ERR() bug
> in event_subsystem_dir()"). Returning NULL will therefore fail the error
> condition check in the caller.
>
> Fix this by substituting the NULL return value with a fitting error
> pointer.
>
> Fixes: 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing ei->dentry")
> Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
> Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx>
> ---
> fs/tracefs/event_inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 5d88c184f0fc..a9c28a1d5dc8 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -736,7 +736,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
> /* Was the parent freed? */
> if (list_empty(&ei->list)) {
> cleanup_ei(ei);
> - ei = NULL;
> + ei = ERR_PTR(-EBUSY);
> }
> return ei;
> }

Must have. LGTM. Thanks.
Reviewed-by: Ajay Kaher <ajay.kaher@xxxxxxxxxxxx>

- Ajay