Re: [PATCH 1/1] tracing: fix a WARN from trace_event_dyn_put_ref
From: Krister Johansen
Date: Tue Aug 16 2022 - 14:58:50 EST
Hi Steven,
Thanks for the feedback.
On Tue, Aug 16, 2022 at 12:25:59PM -0400, Steven Rostedt wrote:
>
> [ Added the authors of this code to the Cc ]
Oops, that was an unintentional omission.
> On Fri, 12 Aug 2022 17:02:20 -0700
> Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx> wrote:
>
> > The code in perf_trace_init takes a reference on a trace_event_call that is
> > looked up as part of the function call. If perf_trace_event_int fails,
> > however, perf_trace_event_unreg can decrement that refcount from underneath
> > perf_trace_init. This means that in some failure cases, perf_trace_init
> > can trigger the WARN in trace_dynevent.c which attempts to guard against
> > zero reference counts going negative.
> >
> > The author can reproduce this problem readily by running perf record in a
> > loop against a series of uprobes with no other users. Killing the record
> > process before it can finish its setup is enough to trigger this warn
> > within a few seconds.
> >
> > This patch leaves the behavior in perf_trace_event_unreg unchanged, but
> > moves most of the code in that function to perf_trace_event_cleanup. The
> > unreg function retains the ability to drop the refcount on the tp_event,
> > but cleanup does not. This modification is based upon the observation that
> > all of the other callers of perf_trace_event_init don't bother with
> > manipulating a reference count on the tp_events that they create. For
> > those callers, the trace_event_put_ref was already a no-op.
<...>
> > @@ -207,7 +213,7 @@ static int perf_trace_event_init(struct trace_event_call *tp_event,
> >
> > ret = perf_trace_event_open(p_event);
> > if (ret) {
> > - perf_trace_event_unreg(p_event);
> > + perf_trace_event_cleanup(p_event);
>
> The only problem I have with this patch is the loss of symmetry. Where
> perf_trace_event_reg() returns successful, so unreg() should be the
> function you call on failure.
>
> Since perf_trace_event_reg() is only called from perf_trace_event_init()
> let's move the perf_trace_event_open() into the perf_trace_event_reg() and
> have the unreg still do the clean up on failure.
>
> This way we keep the symmetry between *_reg() and *_unreg().
>
> And then the init function will not have to do any clean up, and can just
> end with:
>
> return perf_trace_event_reg(tp_event, p_event);
I went to implement this suggestion. In the process, I found that I was
potentially creating more asymmetry. There are a number of places in
the teardown path where the code does something like this:
perf_trace_event_close(p_event);
perf_trace_event_unreg(p_event);
In this situation, it seems like I should move the close into the unreg
as well. However, if I do that, then unreg can't be called from reg
until open has been successful.
The original problem that I set out to solve was to prevent
perf_trace_event_unreg() from decrementing a refcount in
perf_trace_event_init(). If I pull unreg() into the failure path in
reg then I think I still need to keep that refcount decrement present
for the users that are calling unreg from outside of reg().
After looking at the failure handling in perf_trace_event_reg(),
it actually shares a lot of code with perf_trace_event_unreg(). Unifying
the two would be nice. If unreg is called from reg, that also makes
it difficult for unreg to be used separately.
If the primary objection is around the naming, I have a couple of
possible ideas:
1. Rename perf_trace_event_cleanup to perf_trace_event_unreg_noref and
use it in perf_trace_event_init. This keeps some of the reg/unreg
symmetry, but makes it clear that its the unreg that won't touch the
tp_event refcount.
2. Introduce a function that's complementary to perf_trace_event_init
and name it perf_trace_event_destroy. Modify the logic to be something
like the following:
perf_trace_event_init() {
trace_event_try_get_ref(tp_event);
perf_trace_event_perm();
perf_trace_event_reg();
perf_trace_event_open();
}
perf_trace_event_destroy() {
perf_trace_event_close();
perf_trace_event_unreg();
trace_event_put_ref();
}
And then modify the current callers of the close/unreg pair to use
destroy.
What do you think, would either of these be a solution that you would
find acceptable?
-K