Re: [PATCH bpf-next v2] bpf: Add bpf_check_attach_target_with_klog method to output failure logs to kernel

From: Leon Hwang
Date: Tue Jul 30 2024 - 23:32:04 EST




On 31/7/24 01:28, Andrii Nakryiko wrote:
> On Mon, Jul 29, 2024 at 8:32 PM Leon Hwang <hffilwlqm@xxxxxxxxx> wrote:
>>
>>
>>
>> On 30/7/24 05:01, Andrii Nakryiko wrote:
>>> On Fri, Jul 26, 2024 at 9:04 PM Leon Hwang <hffilwlqm@xxxxxxxxx> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/7/27 08:12, Andrii Nakryiko wrote:
>>>>> On Thu, Jul 25, 2024 at 7:57 PM Leon Hwang <hffilwlqm@xxxxxxxxx> wrote:
>>>>>>
>>>>>>
>>>>>>
>>
>> [...]
>>
>>>>>>
>>>>>> Is it OK to add a tracepoint here? I think tracepoint is more generic
>>>>>> than retsnoop-like way.
>>>>>
>>>>> I personally don't see a problem with adding tracepoint, but how would
>>>>> it look like, given we are talking about vararg printf-style function
>>>>> calls? I'm not sure how that should be represented in such a way as to
>>>>> make it compatible with tracepoints and not cause any runtime
>>>>> overhead.
>>>>
>>>> The tracepoint is not about vararg printf-style function calls.
>>>>
>>>> It is to trace the reason why it fails to bpf_check_attach_target() at
>>>> attach time.
>>>>
>>>
>>> Oh, that changes things. I don't think we can keep adding extra
>>> tracepoints for various potential reasons that BPF prog might be
>>> failing to verify.
>>>
>>> But there is usually no need either. This particular code already
>>> supports emitting extra information into verifier log, you just have
>>> to provide that. This is done by libbpf automatically, can't your
>>> library of choice do the same (if BPF program failed).
>>>
>>> Why go to all this trouble if we already have a facility to debug
>>> issues like this. Note every issue is logged into verifier log, but in
>>> this case it is.
>>>
>>
>> Yeah, it is unnecessary to add tracepoint here, as we are able to trace
>> the log message in bpf_log() arguments with retsnoop.
>
> My point was that you don't even need retsnoop, you can just ask for
> verifier log directly, that's the main way to understand and debug BPF
> program verification/load failures.
>

Nope. It is not about BPF program verification/load failures. It is
about freplace program attach failures instead.

As for freplace program, it can attach to a different target from the
target at load time, since commit 4a1e7c0c63e0 ("bpf: Support attaching
freplace programs to multiple attach points").

This patch tries to provide a better way to figure out the reason why a
freplace program fails to attach.

For example:

tcp_connect.c:

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

static __noinline int
stub_handler_static(void)
{
bpf_printk("freplace, stub handler static\n");

return 0;
}

__noinline int
stub_handler(void)
{
bpf_printk("freplace, stub handler\n");

return 0;
}

SEC("kprobe/tcp_connect")
int k_tcp_connect(struct pt_regs *ctx)
{
stub_handler_static();

return stub_handler();
}

char __license[] SEC("license") = "GPL";

freplace.c:

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

SEC("freplace/stub_handler")
int freplace_handler(void)
{
bpf_printk("freplace, replaced handler\n");

return 0;
}

char __license[] SEC("license") = "GPL";

And, here's pseudo C code snippet with libbpf to show attach failure:

tcp_skel = tcp_connect__open_and_load();
prog_fd = tcp_skel->progs.k_tcp_connect;
freplace_skel = freplace__open();
freplace_prog = freplace_skel->progs.freplace_handler;
err = bpf_program__set_attach_target(freplace_prog, prog_fd,
"stub_handler");
err = freplace__load(freplace_skel);
freplace_link = bpf_program__attach_freplace(freplace_prog, prog_fd,
"stub_handler_static");

freplace_link will be -EINVAL because stub_handler_static() is not a
global function, as we have figured out.

With this patch, "stub_handler_static() is not a global function" will
be printed in dmesg. But we should not pollute dmesg with such message.

If there's the tracepoint that I mentioned previously,
"stub_handler_static() is not a global function" can be retrieved by the
tracepoint.

Thanks,
Leon