Re: [RFC PATCH v7 5/7] libbpf: Implement BTF type sorting validation for binary search optimization

From: Eduard Zingerman

Date: Fri Nov 21 2025 - 14:07:19 EST


On Thu, 2025-11-20 at 15:25 +0800, Donglin Peng wrote:
> On Thu, Nov 20, 2025 at 3:50 AM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Tue, Nov 18, 2025 at 7:21 PM Donglin Peng <dolinux.peng@xxxxxxxxx> wrote:
> > >
> > > From: Donglin Peng <pengdonglin@xxxxxxxxxx>
> > >
> > > This patch adds validation to verify BTF type name sorting, enabling
> > > binary search optimization for lookups.
> > >
> > > Cc: Eduard Zingerman <eddyz87@xxxxxxxxx>
> > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
> > > Cc: Alan Maguire <alan.maguire@xxxxxxxxxx>
> > > Cc: Song Liu <song@xxxxxxxxxx>
> > > Cc: Xiaoqin Zhang <zhangxiaoqin@xxxxxxxxxx>
> > > Signed-off-by: Donglin Peng <pengdonglin@xxxxxxxxxx>
> > > ---
> > >  tools/lib/bpf/btf.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 59 insertions(+)
> > >
> > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > index 1d19d95da1d0..d872abff42e1 100644
> > > --- a/tools/lib/bpf/btf.c
> > > +++ b/tools/lib/bpf/btf.c
> > > @@ -903,6 +903,64 @@ int btf__resolve_type(const struct btf *btf, __u32 type_id)
> > >         return type_id;
> > >  }
> > >
> > > +/* Anonymous types (with empty names) are considered greater than named types
> > > + * and are sorted after them. Two anonymous types are considered equal. Named
> > > + * types are compared lexicographically.
> > > + */
> > > +static int btf_compare_type_names(const void *a, const void *b, void *priv)
> > > +{
> > > +       struct btf *btf = (struct btf *)priv;
> > > +       struct btf_type *ta = btf_type_by_id(btf, *(__u32 *)a);
> > > +       struct btf_type *tb = btf_type_by_id(btf, *(__u32 *)b);
> > > +       const char *na, *nb;
> > > +       bool anon_a, anon_b;
> > > +
> > > +       na = btf__str_by_offset(btf, ta->name_off);
> > > +       nb = btf__str_by_offset(btf, tb->name_off);
> > > +       anon_a = str_is_empty(na);
> > > +       anon_b = str_is_empty(nb);
> > > +
> > > +       if (anon_a && !anon_b)
> > > +               return 1;
> > > +       if (!anon_a && anon_b)
> > > +               return -1;
> > > +       if (anon_a && anon_b)
> > > +               return 0;
> >
> > any reason to hard-code that anonymous types should come *after* named
> > ones? That requires custom comparison logic here and resolve_btfids,
> > instead of just relying on btf__str_by_offset() returning valid empty
> > string for name_off == 0 and then sorting anon types before named
> > ones, following normal lexicographical sorting rules?
>
> Thanks. I found that some kernel functions like btf_find_next_decl_tag,
> bpf_core_add_cands, find_bpffs_btf_enums, and find_btf_percpu_datasec
> still use linear search.

- btf_find_next_decl_tag() - this function is called from:
- btf_find_decl_tag_value(), here whole scan over all BTF types is
guaranteed to happen (because btf_find_next_decl_tag() is called
twice);
- btf_prepare_func_args(), here again whole scan is guaranteed to
happen, because of the while loop starting from id == 0.
- bpf_core_add_cands() this function is called from
bpf_core_find_cands(), where it does a linear scan over all types in
kernel BTF and then a linear scan over all types in module BTFs.
(Because of how targ_start_id parameter is passed).
- find_bpffs_btf_enums() - this function does a linear scan over all
types in module BTFs.
- find_btf_percpu_datasec() - this function looks for a DATASEC with
name ".data..percpu" and returns as soon as the match is found.

Of the 4 functions above only find_btf_percpu_datasec() will return
early if BTF type with specified name is found. And it can be
converted to use btf_find_by_name_kind().

So, it appears that there should not be any performance penalty
(compared to current state of affairs) if anonymous types are put in
front. Wdyt?

> Putting named types first would also help here, as
> it allows anonymous types to be skipped naturally during the search.
> Some of them could be refactored to use btf_find_by_name_kind, but some
> would not be appropriate, such as btf_find_next_decl_tag,
> bpf_core_add_cands,find_btf_percpu_datasec.

Did you observe any performance issue if anonymous types are put in
the front?

> Additionally, in the linear search branch, I saw there is a NULL check for
> the name returned by btf__name_by_offset. This suggests that checking
> name_off == 0 alone may not be sufficient to identify an anonymous type,
> which is why I used str_is_empty for a more robust check.
>
> >
> > [...]