Re: [RFC 0/3] Revert SRCU from tracepoint infrastructure

From: Joel Fernandes
Date: Mon Feb 10 2020 - 11:59:50 EST


Hi Mathieu,
Nice to hear from you. I replied below:

On Sat, Feb 08, 2020 at 11:31:25AM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 7, 2020, at 3:56 PM, Joel Fernandes, Google joel@xxxxxxxxxxxxxxxxx wrote:
>
> > Hi,
> > These patches remove SRCU usage from tracepoints. The reason for proposing the
> > reverts is because the whole point of SRCU was to avoid having to call
> > rcu_irq_enter_irqson(). However this was added back in 865e63b04e9b2 ("tracing:
> > Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints") because perf
> > was breaking..
>
> I think the original patch re-enabling the rcu_irq_enter/exit_irqson() is a
> tracepoint band-aid over what should actually been fixed within perf instead.
>
> Perf should not do rcu_read_lock/unlock()/synchronize_rcu(), but rather use
> tracepoint_synchronize_unregister() to match the read-side provided by
> tracepoints.

It feels like here you are kind of forcing tracepoint callbacks on what to
do. Why should we limit what tracepoint callbacks want to do? Further if the
callback indirectly calls some other kernel API that does use rcu_read_lock(), then it
is trouble again. I would rather make callbacks more robust, than having us
force down unwritten / undocumented rules onto them. BPF in their callbacks
also use rcu_read_lock from what I remember (but I'll have to double check).

> If perf can then just rely on the underlying synchronization provided by each
> instrumentation providers (tracepoint, kprobe, ...) and not explicitly add its own
> unneeded synchronization on top (e.g. rcu_read_lock/unlock), then it should simplify
> all this.

I kind of got lost here. The SRCU synchronization in current code is for
tracepoint_srcu which is for the tracepoint function table. Perf can't rely
on _that_ "underlying" synchronization. That is for a completely different
SRCU domain.

I think what you're proposing is:
1. Perf use its own SRCU domain and synchronize on that.
2. We remove rcu_irq_enter_irqson() for *rcuidle cases and just use SRCU in
all callbacks.

Is that right?

I think Peter said he does now want / like a separate SRCU domain within Perf
so that sounds like settled ;-)

Further what if a tracepoint callback calls into some code that does
preempt_disable() and exepects that to be an RCU read-side section? That will
get hosed too since RCU is not watching.

I would say RCU _has_ to watch callback code to be fair to them. Anything
else is a cop out IMO.

> > Since SRCU is not providing any benefit because of 865e63b04e9b2 anyway, let us
> > revert SRCU tracepoint code to maintain the sanity of potential
> > tracepoint callback registerers.
>
> Introducing SRCU was done to simplify handling of rcuidle, thus removing some
> significant overhead that has been noticed due to use of rcu_irq_enter/exit_irqson().

But rcu_irq_enter() was added right back thus nulling that benefit.

> There is another longer-term goal in adding SRCU-synchronized tracepoints: adding
> the ability to create tracepoint probes which will be allowed to handle page
> faults properly. This is very relevant for the syscall tracepoints reading the
> system call parameters from user-space. Currently, we are stuck with sub par
> hacks such as filling the missing data with zeroes. Usually nobody notices because
> most syscall arguments are typically hot in the page cache, but it is still fragile.
>
> I'd very much prefer see commits moving syscall tracepoints to use of SRCU
> (without preempt disable around the tracepoint calls) rather than a commit removing
> tracepoint SRCU use because of something that needs to be fixed within perf.

But such SRCU implementation / usage has to be done within the callback
itself (for syscalls in this case), that has nothing to do with removing SRCU
for tracepoint_srcu (the table). Perhaps for the syscall case, we can add a
new trace_ API specifically for SRCU that does the rcu_irq_enters_on() call
but does not do preempt_disable_notrace() before calling callbacks, thus
allowing the callback to handle page faults? And such new trace_ API can call
srcu_read_{,un}lock() on an SRCU domain specific to the tracepoint,
before/after the callback invocation.

thanks,

- Joel


> Thanks,
>
> Mathieu
>
>
> >
> > Joel Fernandes (Google) (3):
> > Revert "tracepoint: Use __idx instead of idx in DO_TRACE macro to make
> > it unique"
> > Revert "tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle
> > tracepoints"
> > Revert "tracepoint: Make rcuidle tracepoint callers use SRCU"
> >
> > include/linux/tracepoint.h | 40 ++++++--------------------------------
> > kernel/tracepoint.c | 10 +---------
> > 2 files changed, 7 insertions(+), 43 deletions(-)
> >
> > --
> > 2.25.0.341.g760bfbb309-goog
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com