Re: [PATCH v3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast

From: Paul E. McKenney

Date: Thu Dec 11 2025 - 19:47:35 EST


On Fri, Dec 12, 2025 at 09:12:07AM +0900, Joel Fernandes wrote:
>
>
> On 12/11/2025 3:23 PM, Paul E. McKenney wrote:
> > On Thu, Dec 11, 2025 at 08:02:15PM +0000, Joel Fernandes wrote:
> >>
> >>
> >>> On Dec 8, 2025, at 1:20 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >>>
> >>> The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
> >>> to protect invocation of __DO_TRACE_CALL() means that BPF programs
> >>> attached to tracepoints are non-preemptible. This is unhelpful in
> >>> real-time systems, whose users apparently wish to use BPF while also
> >>> achieving low latencies. (Who knew?)
> >>>
> >>> One option would be to use preemptible RCU, but this introduces
> >>> many opportunities for infinite recursion, which many consider to
> >>> be counterproductive, especially given the relatively small stacks
> >>> provided by the Linux kernel. These opportunities could be shut down
> >>> by sufficiently energetic duplication of code, but this sort of thing
> >>> is considered impolite in some circles.
> >>>
> >>> Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
> >>> readers than those of preemptible RCU, at least on Paul E. McKenney's
> >>> laptop, where task_struct access is more expensive than access to per-CPU
> >>> variables. And SRCU-fast provides way faster readers than does SRCU,
> >>> courtesy of being able to avoid the read-side use of smp_mb(). Also,
> >>> it is quite straightforward to create srcu_read_{,un}lock_fast_notrace()
> >>> functions.
> >>>
> >>> While in the area, SRCU now supports early boot call_srcu(). Therefore,
> >>> remove the checks that used to avoid such use from rcu_free_old_probes()
> >>> before this commit was applied:
> >>>
> >>> e53244e2c893 ("tracepoint: Remove SRCU protection")
> >>>
> >>> The current commit can be thought of as an approximate revert of that
> >>> commit, with some compensating additions of preemption disabling.
> >>> This preemption disabling uses guard(preempt_notrace)().
> >>>
> >>> However, Yonghong Song points out that BPF assumes that non-sleepable
> >>> BPF programs will remain on the same CPU, which means that migration
> >>> must be disabled whenever preemption remains enabled. In addition,
> >>> non-RT kernels have performance expectations that would be violated by
> >>> allowing the BPF programs to be preempted.
> >>>
> >>> Therefore, continue to disable preemption in non-RT kernels, and protect
> >>> the BPF program with both SRCU and migration disabling for RT kernels,
> >>> and even then only if preemption is not already disabled.
> >>
> >> Hi Paul,
> >>
> >> Is there a reason to not make non-RT also benefit from SRCU fast and trace points for BPF? Can be a follow up patch though if needed.
> >
> > Because in some cases the non-RT benefit is suspected to be negative
> > due to increasing the probability of preemption in awkward places.
>
> Since you mentioned suspected, I am guessing there is no concrete data collected
> to substantiate that specifically for BPF programs, but correct me if I missed
> something. Assuming you're referring to latency versus tradeoffs issues, due to
> preemption, Android is not PREEMPT_RT but is expected to be low latency in
> general as well. So is this decision the right one for Android as well,
> considering that (I heard) it uses BPF? Just an open-ended question.
>
> There is also issue of 2 different paths for PREEMPT_RT versus otherwise,
> complicating the tracing side so there better be a reason for that I guess.

You are advocating a change in behavior for non-RT workloads. Why do
you believe that this change would be OK for those workloads?

Thanx, Paul