Re: [RFC PATCH v4 4/4] tracing: Add might_fault() check in __DO_TRACE() for syscall
From: Jordan Rife
Date: Wed Oct 30 2024 - 13:35:32 EST
> > [ 687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago):
> > [ 687.335615][T16276] tracepoint_add_func+0x28a/0xd90
> > [ 687.336424][T16276] tracepoint_probe_register_prio_may_exist+0xa2/0xf0
> > [ 687.337416][T16276] bpf_probe_register+0x186/0x200
> > [ 687.338174][T16276] bpf_raw_tp_link_attach+0x21f/0x540
> > [ 687.339233][T16276] __sys_bpf+0x393/0x4fa0
> > [ 687.340042][T16276] __x64_sys_bpf+0x78/0xc0
> > [ 687.340801][T16276] do_syscall_64+0xcb/0x250
> > [ 687.341623][T16276] entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> I think the stack trace points out that the patch [1] isn't really fixing it.
> UAF is on access to bpf_link in __traceiter_sys_enter
The stack trace points to the memory in question being allocated by
tracepoint_add_func where allocation and assignment to
__tracepoint_sys_enter->funcs happens. Mathieu's patch addresses
use-after-free on this structure by using call_rcu_tasks_trace inside
release_probes. In contrast, here is what the "allocated by" trace
looks like for UAF on access to bpf_link (copied from the original
KASAN crash report [4]).
> Allocated by task 5681:
> kasan_save_stack mm/kasan/common.c:47 [inline]
> kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
> poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
> __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
> kasan_kmalloc include/linux/kasan.h:260 [inline]
> __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4304
> kmalloc_noprof include/linux/slab.h:901 [inline]
> kzalloc_noprof include/linux/slab.h:1037 [inline]
> bpf_raw_tp_link_attach+0x2a0/0x6e0 kernel/bpf/syscall.c:3829
> bpf_raw_tracepoint_open+0x177/0x1f0 kernel/bpf/syscall.c:3876
> __sys_bpf+0x3c0/0x810 kernel/bpf/syscall.c:5691
> __do_sys_bpf kernel/bpf/syscall.c:5756 [inline]
> __se_sys_bpf kernel/bpf/syscall.c:5754 [inline]
> __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5754
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
This clearly points to where memory for a bpf_link is allocated.
> link = kzalloc(sizeof(*link), GFP_USER);
> if (!link) {
> err = -ENOMEM;
> goto out_put_btp;
> }
To add some context, if I apply Mathieu's patch alone then I get no
meaningful test signal when running the reproducer, because the UAF
crash almost always occurs first on accesses to bpf_link or bpf_prog
showing a trace like the second one above. My intent in applying patch
[1] is to mask out these sources of UAF-related crashes on the BPF
side to just focus on what this series addresses. This series should
eventually be tested end-to-end with Andrii's fix for the BPF stuff
that he mentioned last week, but that would rely on this patch series,
tracepoint_is_faultable() in particular, so it's kind of chicken and
egg in terms of testing. In the meantime, [1] provides a bandaid to
allow some degree of test coverage on this patch.
> while your patch [1] and all attempts to "fix" were delaying bpf_prog.
> The issue is not reproducing anymore due to luck.
[1] chains call_rcu_tasks_trace and call_rcu to free both bpf_prog and
bpf_link after unregistering the trace point. This grace period should
be sufficient to prevent UAF on these structures from the syscall TP
handlers which are protected with rcu_read_lock_trace. I've run the
reproducer many times. Without /some/ fix on the BPF side it crashes
reliably within seconds here. Using call_rcu_tasks_trace or chaining
it with call_rcu eliminates UAF on the BPF stuff which eliminates a
couple of variables for local testing.
If you are not convinced, I'm happy to run through other test
scenarios or run the reproducer for much longer.
-Jordan