Re: [PATCH] tracing: Add disable-filter-buf option

From: Steven Rostedt
Date: Sun Dec 17 2023 - 18:57:15 EST


On Sun, 17 Dec 2023 17:10:45 +0900
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:
> > >> It exposes the following details which IMHO should be hidden or
> > >> configurable in a way that allows moving to a whole new mechanism
> > >> which will have significantly different characteristics in the
> > >> future:
> > >>
> > >> It exposes that:
> > >>
> > >> - filtering uses a copy to a temporary buffer, and
> > >> - that this copy is enabled by default.
> > >>
> > >> Once exposed, those design constraints become immutable due to ABI.
> > >
> > > No it is not. There is no such thing as "immutable ABI". The rule is
> > > "don't break user space" If this functionality in the kernel goes away,
> > > the knob could become a nop, and I doubt any user space will break
> > > because of it.
> > >
> > > That is, the only burden is keeping this option exposed. But it could
> > > be just like that light switch that has nothing connected to it. It's
> > > still there, but does nothing if you switch it. This knob can act the
> > > same way. This does not in anyway prevent future innovation.
> >
> > I am not comfortable with exposing internal ring buffer implementation
> > details to userspace which may or may not be deprecated as no-ops
> > in the future. This will lead to useless clutter.
>
> Hmm, but this may change the ring buffer consumption rate if the filter
> is enabled. The ring buffer may be filled quickly than the user expected.

WHich it has been since 0fc1b09ff1ff4 ("tracing: Use temp buffer when
filtering events"), and before that commit, things were a bit slower.
That commit sped things up. But, even with that commit, there's no
guarantee that you will get to use the temp buffer and just write
directly into the ring buffer.

And I found that currently histograms (and sythetic events!) also write
directly into the buffer :-p


> Thus if user specifies the rare condition, most of the events on the ring
> buffer is filled with garbage. And user will know the buffer size *seems*
> smaller than the setting.

Not sure what you mean by that. The event on the buffer is removed
unless another event sneaks in and makes it impossible to reset the
next write location before the discarded event.

> I think copying overhead will be a secondary effect, the biggest noticable
> difference is how many events are recorded in the ring buffer. Thus, what
> about naming the option as "filter-on-buffer"?

I'm not sure I understand that. How about just call it "filter_direct",
which means to write directly on the buffer, and default that off.

>
> If we introduce filtering on input directly, at that point we will use
> it if "filter-on-buffer = no", because this is also not noticable from
> users.

The "filter on input" will be a different interface, as the current
filter is only on the output of TRACE_EVENT() fields. The input
parameters isn't exposed at all, and may never be, as that would make
peterz and others keep all tracepoints from their subsystems.

As my main motivation for this was to create a kselftest that can
stress test the filtering directly into the ring buffer (like it use
to, and like it does for interrupting events and histograms and
synthetic events), we can still add tests to make sure that part works.

I'm fine slapping a Kconfig of CONFIG_TRACE_FORCE_DIRECT_KNOB and
place the documentation in the help content saying it adds a knob to
allow kselftest stress test the direct to ring buffer filtering.

-- Steve