Re: [PATCH v2] libbpf: Add some details for BTF parsing failures

From: Ian Rogers
Date: Wed Jan 24 2024 - 13:18:40 EST


On Wed, Jan 24, 2024 at 9:40 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Tue, Jan 23, 2024 at 8:37 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > On Tue, Jan 23, 2024 at 8:25 PM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > On Tue, Jan 23, 2024 at 12:44 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > > >
> > > > As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
> > > > valid kernel BTF" message makes diagnosing the kernel build issue some
> > > > what cryptic. Add a little more detail with the hope of helping users.
> > > >
> > > > Before:
> > > > ```
> > > > libbpf: failed to find valid kernel BTF
> > > > libbpf: Error loading vmlinux BTF: -3
> > > > libbpf: failed to load object 'lock_contention_bpf'
> > > > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > > > ```
> > > >
> > > > After no access /sys/kernel/btf/vmlinux:
> > > > ```
> > > > libbpf: Unable to access canonical vmlinux BTF from /sys/kernel/btf/vmlinux
> > > > libbpf: Error loading vmlinux BTF: -3
> > > > libbpf: failed to load object 'lock_contention_bpf'
> > > > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > > > ```
> > > >
> > > > After no BTF /sys/kernel/btf/vmlinux:
> > > > ```
> > > > libbpf: Failed to load vmlinux BTF from /sys/kernel/btf/vmlinux, was CONFIG_DEBUG_INFO_BTF enabled?
> > > > libbpf: Error loading vmlinux BTF: -3
> > > > libbpf: failed to load object 'lock_contention_bpf'
> > > > libbpf: failed to load BPF skeleton 'lock_contention_bpf': -3
> > > > ```
> > > >
> > > > Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@xxxxxxxxxxxxxx/
> > > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > > >
> > > > ---
> > > > v2. Try to address review comments from Andrii Nakryiko.
> > > > ---
> > > > tools/lib/bpf/btf.c | 49 ++++++++++++++++++++++++++++++++-------------
> > > > 1 file changed, 35 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > > index ee95fd379d4d..d8a05dda0836 100644
> > > > --- a/tools/lib/bpf/btf.c
> > > > +++ b/tools/lib/bpf/btf.c
> > > > @@ -4920,16 +4920,25 @@ static int btf_dedup_remap_types(struct btf_dedup *d)
> > > > return 0;
> > > > }
> > > >
> > > > +static struct btf *btf__load_vmlinux_btf_path(const char *path)
> > >
> > > I don't think we need this helper, you literally call btf__parse() and
> > > pr_debug(), that's all
> > >
> > > > +{
> > > > + struct btf *btf;
> > > > + int err;
> > > > +
> > > > + btf = btf__parse(path, NULL);
> > > > + err = libbpf_get_error(btf);
> > >
> > > we should stop using libbpf_get_error, in libbpf v1.0+ it's best to do just
> > >
> > > btf = btf__parse(path, NULL);
> > > if (!btf) {
> > > err = -errno;
> > > pr_debug(...);
> > > return NULL;
> > > }
> > >
> > > > + pr_debug("loading kernel BTF '%s': %d\n", path, err);
> > > > + return err ? NULL : btf;
> > > > +}
> > > > +
> > > > /*
> > > > * Probe few well-known locations for vmlinux kernel image and try to load BTF
> > > > * data out of it to use for target BTF.
> > > > */
> > > > struct btf *btf__load_vmlinux_btf(void)
> > > > {
> > > > + /* fall back locations, trying to find vmlinux on disk */
> > > > const char *locations[] = {
> > > > - /* try canonical vmlinux BTF through sysfs first */
> > > > - "/sys/kernel/btf/vmlinux",
> > > > - /* fall back to trying to find vmlinux on disk otherwise */
> > > > "/boot/vmlinux-%1$s",
> > > > "/lib/modules/%1$s/vmlinux-%1$s",
> > > > "/lib/modules/%1$s/build/vmlinux",
> > > > @@ -4938,29 +4947,41 @@ struct btf *btf__load_vmlinux_btf(void)
> > > > "/usr/lib/debug/boot/vmlinux-%1$s.debug",
> > > > "/usr/lib/debug/lib/modules/%1$s/vmlinux",
> > > > };
> > > > - char path[PATH_MAX + 1];
> > > > + const char *location;
> > > > struct utsname buf;
> > > > struct btf *btf;
> > > > - int i, err;
> > > > + int i;
> > > >
> > > > - uname(&buf);
> > > > + /* try canonical vmlinux BTF through sysfs first */
> > > > + location = "/sys/kernel/btf/vmlinux";
> > > > + if (faccessat(AT_FDCWD, location, R_OK, AT_EACCESS) == 0) {
> > > > + btf = btf__load_vmlinux_btf_path(location);
> > > > + if (btf)
> > > > + return btf;
> > > > +
> > > > + pr_warn("Failed to load vmlinux BTF from %s, was CONFIG_DEBUG_INFO_BTF enabled?\n",
> > > > + location);
> > >
> > > Mentioning CONFIG_DEBUG_INFO_BTF seems inappropriate here,
> > > /sys/kernel/btf/vmlinux exists, we just failed to parse its data,
> > > right? So it's not about CONFIG_DEBUG_INFO_BTF, we just don't support
> > > something in BTF data. Just pr_warn("Failed to load vmlinux BTF from
> > > %s: %d", location, err); should be good
> >
> > I think that assumes a lot about a user, they understand what BTF
> > means, they know it is controlled by a kernel config option, and that
> > the config option needs to be overridden (as it is defaulted off) for
> > BTF to work. Given this escaped Raspberry Pi OS the potential for this
> > mistake seems high - hence wanting to highlight the config option.
>
> But there is nothing wrong with CONFIG_DEBUG_INFO_BTF, it is enabled,
> and hence there is /sys/kernel/btf/vmlinux on the system. With
> CONFIG_DEBUG_INFO_BTF suggestion you'll just lead users astray. What
> am I missing?

Okay, so can we warn about CONFIG_DEBUG_INFO_BTF if
/sys/kernel/btf/vmlinux isn't accessible? I'd like to clear up
confusion over permissions, generally not an issue as these tools tend
to get run as root, and a missing kernel configuration - the issue
with Raspberry Pi OS.

Thanks,
Ian

> >
> > > > + } else
> > > > + pr_warn("Unable to access canonical vmlinux BTF from %s\n", location);
> > >
> > > here the question of CONFIG_DEBUG_INFO_BTF is more appropriate, if
> > > /sys/kernel/btf/vmlinux (on modern enough kernels) is missing, then
> > > CONFIG_DEBUG_INFO_BTF is missing, probably. But I'd emit this only
> > > after trying all the fallback paths and not finding anything.
> > >
> > > also stylistical nit: if one side of if has {}, the other has to have
> > > {} as well, even if it's just one line
> > >
> > > >
> > > > + uname(&buf);
> > > > for (i = 0; i < ARRAY_SIZE(locations); i++) {
> > > > - snprintf(path, PATH_MAX, locations[i], buf.release);
> > > > + char path[PATH_MAX + 1];
> > > > +
> > > > + snprintf(path, sizeof(path), locations[i], buf.release);
> > > >
> > > > + btf = btf__load_vmlinux_btf_path(path);
> > > > if (faccessat(AT_FDCWD, path, R_OK, AT_EACCESS))
> > > > continue;
> > > >
> > > > - btf = btf__parse(path, NULL);
> > > > - err = libbpf_get_error(btf);
> > > > - pr_debug("loading kernel BTF '%s': %d\n", path, err);
> > > > - if (err)
> > > > - continue;
> > > > + btf = btf__load_vmlinux_btf_path(location);
> > > > + if (btf)
> > > > + return btf;
> > > >
> > > > - return btf;
> > > > + pr_warn("Failed to load vmlinux BTF from %s, was CONFIG_DEBUG_INFO_BTF enabled?\n",
> > >
> > > we should do better here as well. We should distinguish between "there
> > > is vmlinux image, but it has no BTF" vs "there is no vmlinux image" vs
> > > "vmlinux image is there, there is BTF, but we can't parse it". See
> > > btf__parse(). We return -ENODATA if ELF doesn't have BTF, that's the
> > > first situation. We can probably use faccessat() check for second
> > > situation. Everything else can be reported as pr_debug() with location
> > > (but still no CONFIG_DEBUG_INFO_BTF, it's meaningless for fallback BTF
> > > locations)
> > >
> > > > + path);
> > > > }
> > > >
> > > > - pr_warn("failed to find valid kernel BTF\n");
> > >
> > > and then here we can probably warn that we failed to find any kernel
> > > BTF, and suggest CONFIG_DEBUG_INFO_BTF
> >
> > Andrii, you've basically written this patch, can I pass this over to you?
>
> I think it would be great if you can finish thi, thanks.
>
> >
> > Thanks,
> > Ian
> >
> > > > return libbpf_err_ptr(-ESRCH);
> > > > }
> > > >
> > > > --
> > > > 2.43.0.429.g432eaa2c6b-goog
> > > >