Re: [RFC PATCH v3] bpf: Using binary search to improve the performance of btf_find_by_name_kind

From: Eduard Zingerman
Date: Tue Jun 11 2024 - 06:13:36 EST


On Sat, 2024-06-08 at 07:08 -0700, Donglin Peng wrote:

[...]

> Changes in RFC v3:
> - Sort the btf types during the build process in order to reduce memory usage
> and decrease boot time.
>
> RFC v2:
> - https://lore.kernel.org/all/20230909091646.420163-1-pengdonglin@xxxxxxxxxxxxxx
> ---
> include/linux/btf.h | 1 +
> kernel/bpf/btf.c | 160 +++++++++++++++++++++++++++++++++---

I think that kernel part is in a good shape,
please split it as a separate commit.

> tools/lib/bpf/btf.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 345 insertions(+), 11 deletions(-)

[...]

> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 2d0840ef599a..93c1ab677bfa 100644

I'm not sure that libbpf is the best place to put this functionality,
as there might be different kinds of orderings
(e.g. see a fresh commit to bpftool to output stable vmlinux.h:
94133cf24bb3 "bpftool: Introduce btf c dump sorting").

I'm curious what Andrii, Alan and Arnaldo think on libbpf vs pahole
for this feature.

Also, I have a selftests build failure with this patch-set
(and I suspect that a bunch of dedup test cases would need an update):

$ pwd
/home/eddy/work/bpf-next/tools/testing/selftests/bpf
$ make -j14 test_progs
...

GEN-SKEL [test_progs] access_map_in_map.skel.h
Binary files /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.bpf.linked2.o and /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.bpf.linked3.o differ
make: *** [Makefile:658: /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.skel.h] Error 1
make: *** Waiting for unfinished jobs....

If this change remains in libbpf, I think it would be great to update
btf_find_by_name_kind() to work the same way as kernel one.

> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c

[...]

> +static int btf_sort_type_by_name(struct btf *btf)
> +{
> + struct btf_type *bt;
> + __u32 *new_type_offs = NULL, *new_type_offs_noname = NULL;
> + __u32 *maps = NULL, *found_offs;
> + void *new_types_data = NULL, *loc_data;
> + int i, j, k, type_cnt, ret = 0, type_size;
> + __u32 data_size;
> +
> + if (btf_ensure_modifiable(btf))
> + return libbpf_err(-ENOMEM);
> +
> + type_cnt = btf->nr_types;
> + data_size = btf->type_offs_cap * sizeof(*new_type_offs);
> +
> + maps = (__u32 *)malloc(type_cnt * sizeof(__u32));
> + if (!maps) {
> + ret = -ENOMEM;
> + goto err_out;
> + }
> +
> + new_type_offs = (__u32 *)malloc(data_size);
> + if (!new_type_offs) {
> + ret = -ENOMEM;
> + goto err_out;
> + }
> +
> + new_type_offs_noname = (__u32 *)malloc(data_size);
> + if (!new_type_offs_noname) {
> + ret = -ENOMEM;
> + goto err_out;
> + }

What is the point of separating offsets in new_type_offs vs
new_type_offs_noname? It should be possible to use a single offsets
array and have a comparison function that puts all named types before
unnamed.

> +
> + new_types_data = malloc(btf->types_data_cap);
> + if (!new_types_data) {
> + ret = -ENOMEM;
> + goto err_out;
> + }
> +
> + memset(new_type_offs, 0, data_size);
> +
> + for (i = 0, j = 0, k = 0; i < type_cnt; i++) {
> + const char *name;
> +
> + bt = (struct btf_type *)(btf->types_data + btf->type_offs[i]);
> + name = btf__str_by_offset(btf, bt->name_off);
> + if (!name || !name[0])
> + new_type_offs_noname[k++] = btf->type_offs[i];
> + else
> + new_type_offs[j++] = btf->type_offs[i];
> + }
> +
> + memmove(new_type_offs + j, new_type_offs_noname, sizeof(__u32) * k);
> +
> + qsort_r(new_type_offs, j, sizeof(*new_type_offs),
> + btf_compare_type_name, btf);
> +
> + for (i = 0; i < type_cnt; i++) {
> + found_offs = bsearch(&new_type_offs[i], btf->type_offs, type_cnt,
> + sizeof(__u32), btf_compare_offs);
> + if (!found_offs) {
> + ret = -EINVAL;
> + goto err_out;
> + }
> + maps[found_offs - btf->type_offs] = i;
> + }
> +
> + loc_data = new_types_data;
> + for (i = 0; i < type_cnt; i++, loc_data += type_size) {
> + bt = (struct btf_type *)(btf->types_data + new_type_offs[i]);
> + type_size = btf_type_size(bt);
> + if (type_size < 0) {
> + ret = type_size;
> + goto err_out;
> + }
> +
> + memcpy(loc_data, bt, type_size);
> +
> + bt = (struct btf_type *)loc_data;
> + switch (btf_kind(bt)) {

Please take a look at btf_dedup_remap_types(): it uses newly added
iterator interface to enumerate all ID references in the type.
It could be used here to avoid enumerating every BTF kind.
Also, the d->hypot_map could be used instead of `maps`.
And if so, I think that it should be possible to put this pass before
btf_dedup_remap_types() in order for it to do the remapping.

Alternatively, it might make sense to merge this pass with
btf_dedup_compact_types() in order to minimize number of operations,
e.g. as in my crude attempt:
https://github.com/eddyz87/bpf/tree/binsort-btf-dedup
(fails with similar selftests issue).

> + case BTF_KIND_PTR:
> + case BTF_KIND_CONST:
> + case BTF_KIND_VOLATILE:
> + case BTF_KIND_RESTRICT:
> + case BTF_KIND_TYPEDEF:
> + case BTF_KIND_TYPE_TAG:
> + case BTF_KIND_FUNC:
> + case BTF_KIND_VAR:
> + case BTF_KIND_DECL_TAG:
> + bt->type = btf_get_mapped_type(btf, maps, bt->type);
> + break;

[...]