Re: [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use
From: Kris Van Hees
Date: Mon Jun 17 2019 - 21:31:18 EST
On Thu, May 23, 2019 at 01:28:44PM -0700, Alexei Starovoitov wrote:
<< stuff skipped because it is not relevant to the technical discussion... >>
> > > In particular you brought up a good point that there is a use case
> > > for sharing a piece of bpf program between kprobe and tracepoint events.
> > > The better way to do that is via bpf2bpf call.
> > > Example:
> > > void bpf_subprog(arbitrary args)
> > > {
> > > }
> > >
> > > SEC("kprobe/__set_task_comm")
> > > int bpf_prog_kprobe(struct pt_regs *ctx)
> > > {
> > > bpf_subprog(...);
> > > }
> > >
> > > SEC("tracepoint/sched/sched_switch")
> > > int bpf_prog_tracepoint(struct sched_switch_args *ctx)
> > > {
> > > bpf_subprog(...);
> > > }
> > >
> > > Such configuration is not supported by the verifier yet.
> > > We've been discussing it for some time, but no work has started,
> > > since there was no concrete use case.
> > > If you can work on adding support for it everyone will benefit.
> > >
> > > Could you please consider doing that as a step forward?
> >
> > This definitely looks to be an interesting addition and I am happy to look into
> > that further. I have a few questions that I hope you can shed light on...
> >
> > 1. What context would bpf_subprog execute with? If it can be called from
> > multiple different prog types, would it see whichever context the caller
> > is executing with? Or would you envision bpf_subprog to not be allowed to
> > access the execution context because it cannot know which one is in use?
>
> bpf_subprog() won't be able to access 'ctx' pointer _if_ it's ambiguous.
> The verifier already smart enough to track all the data flow, so it's fine to
> pass 'struct pt_regs *ctx' as long as it's accessed safely.
> For example:
> void bpf_subprog(int kind, struct pt_regs *ctx1, struct sched_switch_args *ctx2)
> {
> if (kind == 1)
> bpf_printk("%d", ctx1->pc);
> if (kind == 2)
> bpf_printk("%d", ctx2->next_pid);
> }
>
> SEC("kprobe/__set_task_comm")
> int bpf_prog_kprobe(struct pt_regs *ctx)
> {
> bpf_subprog(1, ctx, NULL);
> }
>
> SEC("tracepoint/sched/sched_switch")
> int bpf_prog_tracepoint(struct sched_switch_args *ctx)
> {
> bpf_subprog(2, NULL, ctx);
> }
>
> The verifier should be able to prove that the above is correct.
> It can do so already if s/ctx1/map_value1/, s/ctx2/map_value2/
> What's missing is an ability to have more than one 'starting' or 'root caller'
> program.
>
> Now replace SEC("tracepoint/sched/sched_switch") with SEC("cgroup/ingress")
> and it's becoming clear that BPF_PROG_TYPE_PROBE approach is not good enough, right?
> Folks are already sharing the bpf progs between kprobe and networking.
> Currently it's done via code duplication and actual sharing happens via maps.
> That's not ideal, hence we've been discussing 'shared library' approach for
> quite some time. We need a way to support common bpf functions that can be called
> from networking and from tracing programs.
>
> > 2. Given that BPF programs are loaded with a specification of the prog type,
> > how would one load a code construct as the one you outline above? How can
> > you load a BPF function and have it be used as subprog from programs that
> > are loaded separately? I.e. in the sample above, if bpf_subprog is loaded
> > as part of loading bpf_prog_kprobe (prog type KPROBE), how can it be
> > referenced from bpf_prog_tracepoint (prog type TRACEPOINT) which would be
> > loaded separately?
>
> The api to support shared libraries was discussed, but not yet implemented.
> We've discussed 'FD + name' approach.
> FD identifies a loaded program (which is root program + a set of subprogs)
> and other programs can be loaded at any time later. The BPF_CALL instructions
> in such later program would refer to older subprogs via FD + name.
> Note that both tracing and networking progs can be part of single elf file.
> libbpf has to be smart to load progs into kernel step by step
> and reusing subprogs that are already loaded.
>
> Note that libbpf work for such feature can begin _without_ kernel changes.
> libbpf can pass bpf_prog_kprobe+bpf_subprog as a single program first,
> then pass bpf_prog_tracepoint+bpf_subprog second (as a separate program).
> The bpf_subprog will be duplicated and JITed twice, but sharing will happen
> because data structures (maps, global and static data) will be shared.
> This way the support for 'pseudo shared libraries' can begin.
> (later accompanied by FD+name kernel support)
As far as I can determine, the current libbpd implementation is already able
to do the duplication of the called function, even when the ELF object contains
programs of differemt program types. I.e. the example you give at the top
of the email actually seems to work already. Right?
In that case, I am a bit unsure what more can be done on the side of libbpf
without needing kernel changes?
> There are other things we discsused. Ideally the body of bpf_subprog()
> wouldn't need to be kept around for future verification when this bpf
> function is called by a different program. The idea was to
> use BTF and similar mechanism to ongoing 'bounded loop' work.
> So the verifier can analyze bpf_subprog() once and reuse that knowledge
> for dynamic linking with progs that will be loaded later.
> This is more long term work.
> A simple short term would be to verify the full call chain every time
> the subprog (bpf function) is reused.