Re: [PATCH v5 6/7] btf: Add lazy sorting validation for binary search
From: Alexei Starovoitov
Date: Fri Nov 07 2025 - 13:54:23 EST
On Fri, Nov 7, 2025 at 10:19 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Fri, 2025-11-07 at 15:08 +0800, Donglin Peng wrote:
> > On Thu, Nov 6, 2025 at 9:47 PM <bot+bpf-ci@xxxxxxxxxx> wrote:
> > >
> > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > index 66cb739a0..33c327d3c 100644
> > > > --- a/kernel/bpf/btf.c
> > > > +++ b/kernel/bpf/btf.c
> > > > @@ -552,6 +552,70 @@ u32 btf_nr_types(const struct btf *btf)
> > > > return total;
> > > > }
> > > >
> > > > +/* Verifies that BTF types are sorted in ascending order
> > > > according to their
> > > > + * names, with named types appearing before anonymous types. If
> > > > the ordering
> > > > + * is correct, counts the number of named types and updates the
> > > > BTF object's
> > > > + * nr_sorted_types field.
> > > > + *
> > > > + * Return: true if types are properly sorted, false otherwise
> > > > + */
> > > > +static bool btf_check_sorted(struct btf *btf)
> > > > +{
> > > > + const struct btf_type *t;
> > > > + int i, n, k = 0, nr_sorted_types;
> > > > +
> > > > + if (likely(btf->nr_sorted_types != BTF_NEED_SORT_CHECK))
> > > > + goto out;
> > > > + btf->nr_sorted_types = 0;
> > > ^
> > >
> > > Can multiple threads race when writing to btf->nr_sorted_types
> > > here?
> > > Looking at btf_find_by_name_kind()->btf_check_sorted(), I see that
> > > btf_find_by_name_kind() receives a const pointer but casts away the
> > > const
> > > to call btf_check_sorted(). The function bpf_find_btf_id() calls
> > > btf_find_by_name_kind() without holding any locks (line 737), and
> > > later
> > > explicitly unlocks before calling it again (lines 756-757).
> > >
> > > This means multiple threads can concurrently enter
> > > btf_check_sorted() and
> > > write to btf->nr_sorted_types. While the validation logic is
> > > idempotent
> > > and all threads would compute the same value, the concurrent writes
> > > to the
> > > same memory location without synchronization could trigger KCSAN
> > > warnings.
> > >
> > > Should this use atomic operations, or should the validation be
> > > performed
> > > under a lock during BTF initialization before the BTF becomes
> > > visible to
> > > multiple threads?
> >
> > Hi, is it necessary to address this issue?
> > For example, by using atomic_try_cmpxchg or WRITE/READ_ONCE? Using
> > atomic_try_cmpxchg can prevent race conditions on writes but requires
> > an atomic
> > variable, while WRITE_ONCE/READ_ONCE can avoid KCSAN warnings. Since
> > the race condition is unlikely to cause critical issues, I suggest
> > using
> > WRITE_ONCE/READ_ONCE.
>
> Probably use WRITE_ONCE/READ_ONCE?
>
> > > > +
> > > > + if (btf->nr_types < 2)
> > > > + goto out;
> > > > +
> > > > + nr_sorted_types = 0;
> > > > + n = btf_nr_types(btf) - 1;
> > > > + for (i = btf_start_id(btf); i < n; i++) {
> > > > + k = i + 1;
> > > > + if (btf_compare_type_names(&i, &k, btf) > 0)
> > > > + goto out;
> > > > +
> > > > + t = btf_type_by_id(btf, i);
> > > > + if (t->name_off)
> > > > + nr_sorted_types++;
> > > > + }
> > > > +
> > > > + t = btf_type_by_id(btf, k);
> > > > + if (t->name_off)
> > > > + nr_sorted_types++;
> > > > + if (nr_sorted_types)
> > > > + btf->nr_sorted_types = nr_sorted_types;
> > > ^
> > >
> > > The same race condition applies to this write of nr_sorted_types as
> > > well.
> > >
> > > > +
> > > > +out:
> > > > + return btf->nr_sorted_types > 0;
> > > > +}
> > >
> > > [ ... ]
> > >
> > > > @@ -610,7 +674,7 @@ s32 btf_find_by_name_kind(const struct btf
> > > > *btf, const char *name, u8 kind)
> > > > goto out;
> > > > }
> > > >
> > > > - if (btf->nr_sorted_types != BTF_NEED_SORT_CHECK) {
> > > > + if (btf_check_sorted((struct btf *)btf)) {
> > > ^
> > >
> > > The const cast here enables the concurrent writes discussed above.
> > > Is
> > > there a reason to mark the btf parameter as const if we're
> > > modifying it?
> >
> > Hi team, is casting away const an acceptable approach for our
> > codebase?
>
> Casting away const is undefined behaviour, e.g. see paragraph 6.7.3.6
> N1570 ISO/IEC 9899:201x Programming languages — C.
>
> Both of the problems above can be avoided if kernel will do sorted
> check non-lazily. But Andrii and Alexei seem to like that property.
Ihor is going to move BTF manipulations into resolve_btfid.
Sorting of BTF should be in resolve_btfid as well.
This way the build process will guarantee that BTF is sorted
to the kernel liking. So the kernel doesn't even need to check
that BTF is sorted.