Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk

From: Alexei Starovoitov
Date: Fri Aug 14 2020 - 13:01:32 EST

Next message: Linus Torvalds: "Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)"
Previous message: Linus Torvalds: "Re: [mm] 3e38e0aaca: WARNING:at_mm/memcontrol.c:#mem_cgroup_alloc"
In reply to: Alan Maguire: "Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Aug 14, 2020 at 02:06:37PM +0100, Alan Maguire wrote:
> On Wed, 12 Aug 2020, Alexei Starovoitov wrote:
>
> > On Thu, Aug 06, 2020 at 03:42:23PM +0100, Alan Maguire wrote:
> > >
> > > The bpf_trace_printk tracepoint is augmented with a "trace_id"
> > > field; it is used to allow tracepoint filtering as typed display
> > > information can easily be interspersed with other tracing data,
> > > making it hard to read. Specifying a trace_id will allow users
> > > to selectively trace data, eliminating noise.
> >
> > Since trace_id is not seen in trace_pipe, how do you expect users
> > to filter by it?
>
> Sorry should have specified this. The approach is to use trace
> instances and filtering such that we only see events associated
> with a specific trace_id. There's no need for the trace event to
> actually display the trace_id - it's still usable as a filter.
> The steps involved are:
>
> 1. create a trace instance within which we can specify a fresh
> set of trace event enablings, filters etc.
>
> mkdir /sys/kernel/debug/tracing/instances/traceid100
>
> 2. enable the filter for the specific trace id
>
> echo "trace_id == 100" >
> /sys/kernel/debug/tracing/instances/traceid100/events/bpf_trace/bpf_trace_printk/filter
>
> 3. enable the trace event
>
> echo 1 >
> /sys/kernel/debug/tracing/instances/events/bpf_trace/bpf_trace_printk/enable
>
> 4. ensure the BPF program uses a trace_id 100 when calling bpf_trace_btf()

ouch.
I think you interpreted the acceptance of the
commit 7fb20f9e901e ("bpf, doc: Remove references to warning message when using bpf_trace_printk()")
in the wrong way.

Everything that doc had said is still valid. In particular:
-A: This is done to nudge program authors into better interfaces when
-programs need to pass data to user space. Like bpf_perf_event_output()
-can be used to efficiently stream data via perf ring buffer.
-BPF maps can be used for asynchronous data sharing between kernel
-and user space. bpf_trace_printk() should only be used for debugging.

bpf_trace_printk is for debugging only. _debugging of bpf programs themselves_.
What you're describing above is logging and tracing. It's not debugging of programs.
perf buffer, ring buffer, and seq_file interfaces are the right
interfaces for tracing, logging, and kernel debugging.

> > It also feels like workaround. May be let bpf prog print the whole
> > struct in one go with multiple new lines and call
> > trace_bpf_trace_printk(buf) once?
>
> We can do that absolutely, but I'd be interested to get your take
> on the filtering mechanism before taking that approach. I'll add
> a description of the above mechanism to the cover letter and
> patch to be clearer next time too.

I think patch 3 is no go, because it takes bpf_trace_printk in
the wrong direction.
Instead please refactor it to use string buffer or seq_file as an output.
If the user happen to use bpf_trace_printk("%s", buf);
after that to print that string buffer to trace_pipe that's user's choice.
I can see such use case when program author wants to debug
their bpf program. That's fine. But for kernel debugging, on demand and
"always on" logging and tracing the documentation should point
to sustainable interfaces that don't interfere with each other,
can be run in parallel by multiple users, etc.

> >
> > Also please add interface into bpf_seq_printf.
> > BTF enabled struct prints is useful for iterators too
> > and generalization you've done in this patch pretty much takes it there.
> >
>
> Sure, I'll try and tackle that next time.
>
> > > +#define BTF_SHOW_COMPACT (1ULL << 0)
> > > +#define BTF_SHOW_NONAME (1ULL << 1)
> > > +#define BTF_SHOW_PTR_RAW (1ULL << 2)
> > > +#define BTF_SHOW_ZERO (1ULL << 3)
> > > +#define BTF_SHOW_NONEWLINE (1ULL << 32)
> > > +#define BTF_SHOW_UNSAFE (1ULL << 33)
> >
> > I could have missed it earlier, but what is the motivation to leave the gap
> > in bits? Just do bit 4 and 5 ?
> >
>
> Patch 3 uses the first 4 as flags to bpf_trace_btf();
> the final two are not supported for the helper as flag values
> so I wanted to leave some space for additional bpf_trace_btf() flags.
> BTF_SHOW_NONEWLINE is always used for bpf_trace_btf(), since the
> tracing adds a newline for us and we don't want to double up on newlines,
> so it's ORed in as an implicit argument for the bpf_trace_btf() case.
> BTF_SHOW_UNSAFE isn't allowed within BPF so it's not available as
> a flag for the helper.

I see. I think such approach is more error prone than explicit filtering.
It's better to check for NONEWLINE and UNSAFE explicitly in those places
instead of playing obscure game of upper bits.
Also the choice of NONEWLINE vs NEWLINE should be available from the program
when it wants to print it into a string or into a seq_file.

Next message: Linus Torvalds: "Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)"
Previous message: Linus Torvalds: "Re: [mm] 3e38e0aaca: WARNING:at_mm/memcontrol.c:#mem_cgroup_alloc"
In reply to: Alan Maguire: "Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace_printk"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]