Re: [PATCHv2 bpf 3/3] bpf: Force cookies array to follow symbols sorting

From: Jiri Olsa
Date: Wed Jun 08 2022 - 04:41:51 EST


On Tue, Jun 07, 2022 at 09:10:32PM -0700, Alexei Starovoitov wrote:
> On Tue, Jun 7, 2022 at 12:56 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> >
> > On Tue, Jun 07, 2022 at 11:40:47AM -0700, Alexei Starovoitov wrote:
> > > On Mon, Jun 6, 2022 at 11:48 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > > >
> > > > When user specifies symbols and cookies for kprobe_multi link
> > > > interface it's very likely the cookies will be misplaced and
> > > > returned to wrong functions (via get_attach_cookie helper).
> > > >
> > > > The reason is that to resolve the provided functions we sort
> > > > them before passing them to ftrace_lookup_symbols, but we do
> > > > not do the same sort on the cookie values.
> > > >
> > > > Fixing this by using sort_r function with custom swap callback
> > > > that swaps cookie values as well.
> > > >
> > > > Fixes: 0236fec57a15 ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link")
> > > > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> > >
> > > It looks good, but something in this patch is causing a regression:
> > > ./test_progs -t kprobe_multi
> > > test_kprobe_multi_test:PASS:load_kallsyms 0 nsec
> > > #80/1 kprobe_multi_test/skel_api:OK
> > > #80/2 kprobe_multi_test/link_api_addrs:OK
> > > #80/3 kprobe_multi_test/link_api_syms:OK
> > > #80/4 kprobe_multi_test/attach_api_pattern:OK
> > > #80/5 kprobe_multi_test/attach_api_addrs:OK
> > > #80/6 kprobe_multi_test/attach_api_syms:OK
> > > #80/7 kprobe_multi_test/attach_api_fails:OK
> > > test_bench_attach:PASS:get_syms 0 nsec
> > > test_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
> > > libbpf: prog 'test_kprobe_empty': failed to attach: No such process
> > > test_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts
> > > unexpected error: -3
> > > #80/8 kprobe_multi_test/bench_attach:FAIL
> > > #80 kprobe_multi_test:FAIL
> >
> > looks like kallsyms search failed to find some symbol,
> > but I can't reproduce with:
> >
> > ./vmtest.sh -- ./test_progs -t kprobe_multi
> >
> > can you share .config you used?
>
> I don't think it's config related.
> Patch 2 is doing:
>
> - if (!bsearch(&name, args->syms, args->cnt, sizeof(*args->syms), symbols_cmp))
> + sym = bsearch(&name, args->syms, args->cnt, sizeof(*args->syms), symbols_cmp);
> + if (!sym)
> + return 0;
> +
> + idx = sym - args->syms;
> + if (args->addrs[idx])
> return 0;
>
> addr = ftrace_location(addr);
> if (!addr)
> return 0;
>
> - args->addrs[args->found++] = addr;
> + args->addrs[idx] = addr;
> + args->found++;
> return args->found == args->cnt ? 1 : 0;
>
> There are plenty of functions with the same name
> in available_filter_functions.
> So
> if (args->addrs[idx])
> return 0;
> triggers for a lot of them.
> At the end args->found != args->cnt.

there's code in get_syms (prog_tests/kprobe_multi_test.c)
that filters out duplicates

>
> Here is trivial debug patch:
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 601ccf1b2f09..c567cf56cb57 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -8037,8 +8037,10 @@ static int kallsyms_callback(void *data, const
> char *name,
> return 0;
>
> idx = sym - args->syms;
> - if (args->addrs[idx])
> + if (args->addrs[idx]) {
> + printk("idx %x name %s\n", idx, name);
> return 0;
> + }
>
> addr = ftrace_location(addr);
> if (!addr)
> @@ -8078,6 +8080,7 @@ int ftrace_lookup_symbols(const char
> **sorted_syms, size_t cnt, unsigned long *a
> err = kallsyms_on_each_symbol(kallsyms_callback, &args);
> if (err < 0)
> return err;
> + printk("found %zd cnt %zd\n", args.found, args.cnt);
> return args.found == args.cnt ? 0 : -ESRCH;
> }
>
> [ 13.096160] idx a500 name unregister_vclock
> [ 13.096930] idx 82fb name pt_regs_offset
> [ 13.106969] idx 92be name set_root
> [ 13.107290] idx 4414 name event_function
> [ 13.112570] idx 7d1d name phy_init
> [ 13.114459] idx 7d13 name phy_exit
> [ 13.114777] idx ab91 name watchdog
> [ 13.115730] found 46921 cnt 47036
>
> I don't understand how it works for you at all.
> It passes in BPF CI only because we don't run
> kprobe_multi_test/bench_attach there (yet).

reproduced after I updated the tree today.. not sure why I did
not see that before, going to check

jirka