Re: [PATCH v2] libbpf: Add some details for BTF parsing failures

From: Andrii Nakryiko
Date: Tue Jan 23 2024 - 23:27:45 EST


On Tue, Jan 23, 2024 at 12:44 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
> valid kernel BTF" message makes diagnosing the kernel build issue some
> what cryptic. Add a little more detail with the hope of helping users.
>
> Before:
> ```
> libbpf: failed to find valid kernel BTF
> libbpf: Error loading vmlinux BTF: -3
> libbpf: failed to load object 'lock_contention_bpf'
> libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> ```
>
> After no access /sys/kernel/btf/vmlinux:
> ```
> libbpf: Unable to access canonical vmlinux BTF from /sys/kernel/btf/vmlinux
> libbpf: Error loading vmlinux BTF: -3
> libbpf: failed to load object 'lock_contention_bpf'
> libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> ```
>
> After no BTF /sys/kernel/btf/vmlinux:
> ```
> libbpf: Failed to load vmlinux BTF from /sys/kernel/btf/vmlinux, was CONFIG_DEBUG_INFO_BTF enabled?
> libbpf: Error loading vmlinux BTF: -3
> libbpf: failed to load object 'lock_contention_bpf'
> libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> ```
>
> Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@xxxxxxxxxxxxxx/
> Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
>
> ---
> v2. Try to address review comments from Andrii Nakryiko.
> ---
> tools/lib/bpf/btf.c | 49 ++++++++++++++++++++++++++++++++-------------
> 1 file changed, 35 insertions(+), 14 deletions(-)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index ee95fd379d4d..d8a05dda0836 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -4920,16 +4920,25 @@ static int btf_dedup_remap_types(struct btf_dedup *d)
> return 0;
> }
>
> +static struct btf *btf__load_vmlinux_btf_path(const char *path)

I don't think we need this helper, you literally call btf__parse() and
pr_debug(), that's all

> +{
> + struct btf *btf;
> + int err;
> +
> + btf = btf__parse(path, NULL);
> + err = libbpf_get_error(btf);

we should stop using libbpf_get_error, in libbpf v1.0+ it's best to do just

btf = btf__parse(path, NULL);
if (!btf) {
err = -errno;
pr_debug(...);
return NULL;
}

> + pr_debug("loading kernel BTF '%s': %d\n", path, err);
> + return err ? NULL : btf;
> +}
> +
> /*
> * Probe few well-known locations for vmlinux kernel image and try to load BTF
> * data out of it to use for target BTF.
> */
> struct btf *btf__load_vmlinux_btf(void)
> {
> + /* fall back locations, trying to find vmlinux on disk */
> const char *locations[] = {
> - /* try canonical vmlinux BTF through sysfs first */
> - "/sys/kernel/btf/vmlinux",
> - /* fall back to trying to find vmlinux on disk otherwise */
> "/boot/vmlinux-%1$s",
> "/lib/modules/%1$s/vmlinux-%1$s",
> "/lib/modules/%1$s/build/vmlinux",
> @@ -4938,29 +4947,41 @@ struct btf *btf__load_vmlinux_btf(void)
> "/usr/lib/debug/boot/vmlinux-%1$s.debug",
> "/usr/lib/debug/lib/modules/%1$s/vmlinux",
> };
> - char path[PATH_MAX + 1];
> + const char *location;
> struct utsname buf;
> struct btf *btf;
> - int i, err;
> + int i;
>
> - uname(&buf);
> + /* try canonical vmlinux BTF through sysfs first */
> + location = "/sys/kernel/btf/vmlinux";
> + if (faccessat(AT_FDCWD, location, R_OK, AT_EACCESS) == 0) {
> + btf = btf__load_vmlinux_btf_path(location);
> + if (btf)
> + return btf;
> +
> + pr_warn("Failed to load vmlinux BTF from %s, was CONFIG_DEBUG_INFO_BTF enabled?\n",
> + location);

Mentioning CONFIG_DEBUG_INFO_BTF seems inappropriate here,
/sys/kernel/btf/vmlinux exists, we just failed to parse its data,
right? So it's not about CONFIG_DEBUG_INFO_BTF, we just don't support
something in BTF data. Just pr_warn("Failed to load vmlinux BTF from
%s: %d", location, err); should be good

> + } else
> + pr_warn("Unable to access canonical vmlinux BTF from %s\n", location);

here the question of CONFIG_DEBUG_INFO_BTF is more appropriate, if
/sys/kernel/btf/vmlinux (on modern enough kernels) is missing, then
CONFIG_DEBUG_INFO_BTF is missing, probably. But I'd emit this only
after trying all the fallback paths and not finding anything.

also stylistical nit: if one side of if has {}, the other has to have
{} as well, even if it's just one line

>
> + uname(&buf);
> for (i = 0; i < ARRAY_SIZE(locations); i++) {
> - snprintf(path, PATH_MAX, locations[i], buf.release);
> + char path[PATH_MAX + 1];
> +
> + snprintf(path, sizeof(path), locations[i], buf.release);
>
> + btf = btf__load_vmlinux_btf_path(path);
> if (faccessat(AT_FDCWD, path, R_OK, AT_EACCESS))
> continue;
>
> - btf = btf__parse(path, NULL);
> - err = libbpf_get_error(btf);
> - pr_debug("loading kernel BTF '%s': %d\n", path, err);
> - if (err)
> - continue;
> + btf = btf__load_vmlinux_btf_path(location);
> + if (btf)
> + return btf;
>
> - return btf;
> + pr_warn("Failed to load vmlinux BTF from %s, was CONFIG_DEBUG_INFO_BTF enabled?\n",

we should do better here as well. We should distinguish between "there
is vmlinux image, but it has no BTF" vs "there is no vmlinux image" vs
"vmlinux image is there, there is BTF, but we can't parse it". See
btf__parse(). We return -ENODATA if ELF doesn't have BTF, that's the
first situation. We can probably use faccessat() check for second
situation. Everything else can be reported as pr_debug() with location
(but still no CONFIG_DEBUG_INFO_BTF, it's meaningless for fallback BTF
locations)

> + path);
> }
>
> - pr_warn("failed to find valid kernel BTF\n");

and then here we can probably warn that we failed to find any kernel
BTF, and suggest CONFIG_DEBUG_INFO_BTF

> return libbpf_err_ptr(-ESRCH);
> }
>
> --
> 2.43.0.429.g432eaa2c6b-goog
>