Re: [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks
From: Andrii Nakryiko
Date: Fri May 15 2026 - 22:15:26 EST
On Fri, May 15, 2026 at 4:44 PM Ihor Solodrai <ihor.solodrai@xxxxxxxxx> wrote:
>
> On 5/15/26 3:40 PM, Andrii Nakryiko wrote:
> > On Thu, May 14, 2026 at 5:53 PM Ihor Solodrai <ihor.solodrai@xxxxxxxxx> wrote:
> >>
> >> Sleepable build ID parsing can block in __kernel_read() [1], so the
> >> stackmap sleepable path must not call it while holding mmap_lock or a
> >> per-VMA read lock.
> >>
> >> The issue and the fix are conceptually similar to a recent procfs
> >> patch [2].
> >>
> >> Resolve each covered VMA with a stable read-side reference, preferring
> >> lock_vma_under_rcu() and falling back to mmap_read_trylock() only long
> >> enough to acquire the VMA read lock. Take a reference to the backing
> >> file, drop the VMA lock, and then parse the build ID through
> >> (sleepable) build_id_parse_file().
> >>
> >> We have to use mmap_read_trylock() (and give up on failure) in this
> >> context because taking mmap_read_lock() is generally unsafe on code
> >> paths reachable from BPF programs [3], and may lead to deadlocks.
> >>
> >> [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@xxxxxxxxx/
> >> [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@xxxxxxxxxx/
> >> [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@xxxxxxxxx/
> >>
> >> Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers")
> >> Suggested-by: Puranjay Mohan <puranjay@xxxxxxxxxx>
> >> Signed-off-by: Ihor Solodrai <ihor.solodrai@xxxxxxxxx>
> >> ---
> >> kernel/bpf/stackmap.c | 109 ++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 109 insertions(+)
> >>
> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> >> index 4ef0fd06cea5..08f7659505d1 100644
> >> --- a/kernel/bpf/stackmap.c
> >> +++ b/kernel/bpf/stackmap.c
> >> @@ -9,6 +9,7 @@
> >> #include <linux/perf_event.h>
> >> #include <linux/btf_ids.h>
> >> #include <linux/buildid.h>
> >> +#include <linux/mmap_lock.h>
> >> #include "percpu_freelist.h"
> >> #include "mmap_unlock_work.h"
> >>
> >> @@ -158,6 +159,109 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id)
> >> memset(id->build_id, 0, BUILD_ID_SIZE_MAX);
> >> }
> >>
> >> +struct stack_map_vma_lock {
> >> + bool vma_locked;
> >> + struct vm_area_struct *vma;
> >> + struct mm_struct *mm;
> >> +};
> >> +
> >
> > tbh, it feels like this should be provided as some sort of primitive
> > by vma/mm API given how common it becomes when one tries to work with
> > VMAs efficiently (in terms of avoiding unnecessary mmap lock)... but
> > that would be a question to Suren maybe
>
> Shakeel raised the same point in v4 [1]. I don't know of any potential
PROCMAP_QUERY and BPF's VMA iterator?
> users of this pattern, and refactoring this into a generic mechanism
> now (with only stackmap using it) seems premature. Although I agree
> with the premise. Let's see if Suren has an opinion.
>
we shouldn't block on this generalization, just something to consider
for the nearest future
> [1] https://lore.kernel.org/bpf/agYzuDIJszA_7rp3@xxxxxxxxx/
>
> >
[...]
> > why is this only sleepable case? the only difference between sleepable
> > and non-sleepable is the use of build_id_parse[_file] vs
> > build_id_parse_nofault to fetch build ID, no? Other than that the
> > algorithm of locking VMAs is the same, no?
>
> There seems to be a difference in lock lifetimes, because
> non-sleepable path may run with IRQs disabled. Right now it does:
>
> bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work);
> ...
> if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) {
> // fallback to IPs...
> }
> for (i = 0; i < trace_nr; i++) {
> // fetch build_id ...
> }
> bpf_mmap_unlock_mm(work, current->mm);
>
> While on sleepable path we always release the lock before parsing with
> the new stack_map_unlock_vma(), because build_id_parse_file() can
> block.
>
> Not sure how this could be reconciled in a common lock/unlock pattern,
> if that's what you're suggesting.
>
> For me it's easier to reason about when sleepable / non-sleepable are
> on distinct codepaths, although we may be able to factor out some
> common helpers.
The problem is that the rest of the logic is not exactly short or
trivial, so it's a bit of a liability to have multiple copies of it
across sleepable/non-sleepable implementations.
But never mind, we don't have to do it right now.
>
>
> >
[...]