Re: [PATCH v2] libbpf: Add some details for BTF parsing failures

From: Ian Rogers
Date: Tue Jan 23 2024 - 23:37:23 EST


On Tue, Jan 23, 2024 at 8:25 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Tue, Jan 23, 2024 at 12:44 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
> > valid kernel BTF" message makes diagnosing the kernel build issue some
> > what cryptic. Add a little more detail with the hope of helping users.
> >
> > Before:
> > ```
> > libbpf: failed to find valid kernel BTF
> > libbpf: Error loading vmlinux BTF: -3
> > libbpf: failed to load object 'lock_contention_bpf'
> > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > ```
> >
> > After no access /sys/kernel/btf/vmlinux:
> > ```
> > libbpf: Unable to access canonical vmlinux BTF from /sys/kernel/btf/vmlinux
> > libbpf: Error loading vmlinux BTF: -3
> > libbpf: failed to load object 'lock_contention_bpf'
> > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > ```
> >
> > After no BTF /sys/kernel/btf/vmlinux:
> > ```
> > libbpf: Failed to load vmlinux BTF from /sys/kernel/btf/vmlinux, was CONFIG_DEBUG_INFO_BTF enabled?
> > libbpf: Error loading vmlinux BTF: -3
> > libbpf: failed to load object 'lock_contention_bpf'
> > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > ```
> >
> > Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@xxxxxxxxxxxxxx/
> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> >
> > ---
> > v2. Try to address review comments from Andrii Nakryiko.
> > ---
> > tools/lib/bpf/btf.c | 49 ++++++++++++++++++++++++++++++++-------------
> > 1 file changed, 35 insertions(+), 14 deletions(-)
> >
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index ee95fd379d4d..d8a05dda0836 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -4920,16 +4920,25 @@ static int btf_dedup_remap_types(struct btf_dedup *d)
> > return 0;
> > }
> >
> > +static struct btf *btf__load_vmlinux_btf_path(const char *path)
>
> I don't think we need this helper, you literally call btf__parse() and
> pr_debug(), that's all
>
> > +{
> > + struct btf *btf;
> > + int err;
> > +
> > + btf = btf__parse(path, NULL);
> > + err = libbpf_get_error(btf);
>
> we should stop using libbpf_get_error, in libbpf v1.0+ it's best to do just
>
> btf = btf__parse(path, NULL);
> if (!btf) {
> err = -errno;
> pr_debug(...);
> return NULL;
> }
>
> > + pr_debug("loading kernel BTF '%s': %d\n", path, err);
> > + return err ? NULL : btf;
> > +}
> > +
> > /*
> > * Probe few well-known locations for vmlinux kernel image and try to load BTF
> > * data out of it to use for target BTF.
> > */
> > struct btf *btf__load_vmlinux_btf(void)
> > {
> > + /* fall back locations, trying to find vmlinux on disk */
> > const char *locations[] = {
> > - /* try canonical vmlinux BTF through sysfs first */
> > - "/sys/kernel/btf/vmlinux",
> > - /* fall back to trying to find vmlinux on disk otherwise */
> > "/boot/vmlinux-%1$s",
> > "/lib/modules/%1$s/vmlinux-%1$s",
> > "/lib/modules/%1$s/build/vmlinux",
> > @@ -4938,29 +4947,41 @@ struct btf *btf__load_vmlinux_btf(void)
> > "/usr/lib/debug/boot/vmlinux-%1$s.debug",
> > "/usr/lib/debug/lib/modules/%1$s/vmlinux",
> > };
> > - char path[PATH_MAX + 1];
> > + const char *location;
> > struct utsname buf;
> > struct btf *btf;
> > - int i, err;
> > + int i;
> >
> > - uname(&buf);
> > + /* try canonical vmlinux BTF through sysfs first */
> > + location = "/sys/kernel/btf/vmlinux";
> > + if (faccessat(AT_FDCWD, location, R_OK, AT_EACCESS) == 0) {
> > + btf = btf__load_vmlinux_btf_path(location);
> > + if (btf)
> > + return btf;
> > +
> > + pr_warn("Failed to load vmlinux BTF from %s, was CONFIG_DEBUG_INFO_BTF enabled?\n",
> > + location);
>
> Mentioning CONFIG_DEBUG_INFO_BTF seems inappropriate here,
> /sys/kernel/btf/vmlinux exists, we just failed to parse its data,
> right? So it's not about CONFIG_DEBUG_INFO_BTF, we just don't support
> something in BTF data. Just pr_warn("Failed to load vmlinux BTF from
> %s: %d", location, err); should be good

I think that assumes a lot about a user, they understand what BTF
means, they know it is controlled by a kernel config option, and that
the config option needs to be overridden (as it is defaulted off) for
BTF to work. Given this escaped Raspberry Pi OS the potential for this
mistake seems high - hence wanting to highlight the config option.

> > + } else
> > + pr_warn("Unable to access canonical vmlinux BTF from %s\n", location);
>
> here the question of CONFIG_DEBUG_INFO_BTF is more appropriate, if
> /sys/kernel/btf/vmlinux (on modern enough kernels) is missing, then
> CONFIG_DEBUG_INFO_BTF is missing, probably. But I'd emit this only
> after trying all the fallback paths and not finding anything.
>
> also stylistical nit: if one side of if has {}, the other has to have
> {} as well, even if it's just one line
>
> >
> > + uname(&buf);
> > for (i = 0; i < ARRAY_SIZE(locations); i++) {
> > - snprintf(path, PATH_MAX, locations[i], buf.release);
> > + char path[PATH_MAX + 1];
> > +
> > + snprintf(path, sizeof(path), locations[i], buf.release);
> >
> > + btf = btf__load_vmlinux_btf_path(path);
> > if (faccessat(AT_FDCWD, path, R_OK, AT_EACCESS))
> > continue;
> >
> > - btf = btf__parse(path, NULL);
> > - err = libbpf_get_error(btf);
> > - pr_debug("loading kernel BTF '%s': %d\n", path, err);
> > - if (err)
> > - continue;
> > + btf = btf__load_vmlinux_btf_path(location);
> > + if (btf)
> > + return btf;
> >
> > - return btf;
> > + pr_warn("Failed to load vmlinux BTF from %s, was CONFIG_DEBUG_INFO_BTF enabled?\n",
>
> we should do better here as well. We should distinguish between "there
> is vmlinux image, but it has no BTF" vs "there is no vmlinux image" vs
> "vmlinux image is there, there is BTF, but we can't parse it". See
> btf__parse(). We return -ENODATA if ELF doesn't have BTF, that's the
> first situation. We can probably use faccessat() check for second
> situation. Everything else can be reported as pr_debug() with location
> (but still no CONFIG_DEBUG_INFO_BTF, it's meaningless for fallback BTF
> locations)
>
> > + path);
> > }
> >
> > - pr_warn("failed to find valid kernel BTF\n");
>
> and then here we can probably warn that we failed to find any kernel
> BTF, and suggest CONFIG_DEBUG_INFO_BTF

Andrii, you've basically written this patch, can I pass this over to you?

Thanks,
Ian

> > return libbpf_err_ptr(-ESRCH);
> > }
> >
> > --
> > 2.43.0.429.g432eaa2c6b-goog
> >