Re: [PATCH] kernel/trace: Add TRACING_ALLOW_PRINTK config option

From: Steven Rostedt
Date: Sun Jun 28 2020 - 14:46:23 EST


On Sun, 28 Jun 2020 10:27:00 -0700
Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:

> On Fri, Jun 26, 2020 at 06:14:55PM -0400, Steven Rostedt wrote:
> > On Wed, 24 Jun 2020 20:59:13 -0700
> > Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > > > >
> > > > > Nack.
> >
> > I nack your nack ;-)
>
> ok. let's take it up to Linus to decide.

I'm fine with that.

>
> >
> > > > > The message is bogus. It's used in production kernels.
> > > > > bpf_trace_printk() calls it.
> > > >
> > > > Interesting. BTW, the same information (trace_printk is for debugging
> > > > only) is repeated all over the place, including where bpf_trace_printk
> > > > is documented:
> > > > https://elixir.bootlin.com/linux/latest/source/include/linux/kernel.h#L757
> > > > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/bpf.h#L706
> > > > https://elixir.bootlin.com/linux/latest/source/kernel/trace/trace.c#L3157
> > > >
> > > > Steven added that warning (2184db46e425c ("tracing: Print nasty banner
> > > > when trace_printk() is in use")), so maybe he can confirm if it's
> > > > still relevant.
> > >
> > > The banner is nasty and it's actively causing harm.
> >
> > And it's doing exactly what it was intended on doing!
>
> I disagree. The message is _lying_ about the state of the kernel.
> It's not a debug kernel and it's absolutely fine for production.

No it is not!

It causes the trace buffer to be filled with crap that can not be
easily disabled. That's the reason I only allowed trace_printk() for
debug kernels. And the only way to prevent people from sticking it in
their code and making an API out of it was for this banner.

I refuse to remove that banner. It's my API!

> >
> > Now I do have an answer for you that I believe is a great compromise.
> >
> > There's something you can call (and even call it from a module). It's
> > called "trace_array_vprintk()". But has one caveat, and that is, you
> > can not write to the main top level trace buffer with it (I have
> > patches for the next merge window to enforce that). And that's what
> > I've been trying to avoid trace_printk() from doing, as that's what it
> > does by default. It writes to /sys/kernel/tracing/trace.
> >
> > Now what you can do, is have bpf create
> > a /sys/kernel/tracing/instances/bpf_trace/ instance, and use
> > trace_array_printk(), to print into that, and you will never have to
> > see that warning again! It shows up in your own
> > tracefs/instances/bpf_trace/trace file!
> >
> > If you need more details, let me know, and I can give you all you need
> > to know to create you very own trace instance (that can enable events,
> > kprobe events, uprobe events, function tracing, and soon function graph
> > tracing). And the bonus, you get trace_array_vprintk() and no more
> > complaining. :-) :-) :-)
>
> We added a bunch of code to libbcc in the past to support instances,
> but eventually removed it all due to memory overhead per instance.
> If I recall it was ~8Mbyte per instance. That was couple years ago.

I'd like to see where that 8 MB per instance came from. You can control
the size of the instance buffers. If size is still an issue, I'll be
happy to work with you to fix it.


>
> By now everyone has learned to use bpf_trace_printk() and expects
> to see the output in /sys/kernel/debug/tracing/trace.
> It's documented in uapi/bpf.h and various docs.

Re-teach them, or are you finally admitting that the tracing system is
a permanent API? This is the reason people are refusing to add trace
points into their subsystems. Because user space may make it required.

I see no reason why you can't create a dedicated BPF tracing instance
(you only need one) to add all your trace_array_printk()s to.

-- Steve