Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up

From: Paul E. McKenney
Date: Thu Dec 15 2022 - 14:02:10 EST


On Thu, Dec 15, 2022 at 01:51:02PM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 09:02:56 -0800
> "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
>
> > On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> > > On Wed, 14 Dec 2022 12:03:33 -0800
> > > "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
> > >
> > > > > > Avoid calling the synchronization function when system_state is
> > > > > > SYSTEM_BOOTING.
> > > > >
> > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > > > Then, it will prevent similar warnings if we expand boot time feature.
> > > >
> > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > > > Just in case there are ever additional issues of this sort?
> > >
> > > Adding more tracing command line parameters is triggering this more. I now
> > > hit:
> >
> > Fair point, and apologies for the hassle.
> >
> > Any chance of getting an official "it is now late enough in boot to
> > safely invoke lockdep" API? ;-)
>
> lockdep API isn't the problem, it's that we are still in the earlyboot stage
> where interrupts are disabled, and you can't enable them. Lockdep works
> just fine there, and is reporting interrupts being disabled correctly. The
> backtrace reported *does* have interrupts disabled.
>
> The thing is, because we are still running on a single CPU with interrupts
> disabled, there is no need for synchronization. Even grabbing a mutex is
> safe because there's guaranteed to be no contention (unless it's not
> released). This is why a lot of warnings are suppressed if system_state is
> SYSTEM_BOOTING.

Agreed, and my second attempt is a bit more surgical. (Please see below
for a more official version of it.)

> > In the meantime, does the (untested and quite crude) patch at the end
> > of this message help?
>
> I'll go and test it, but I'm guessing it will work fine. You could also test
> against system_state != SYSTEM_BOOTING, as that gets cleared just before
> kernel_init() can continue (it waits for the complete() that is called
> after system_state is set to SYSTEM_SCHEDULING). Which happens shortly
> after rcu_scheduler_starting().
>
> I wonder if you could even replace RCU_SCHEDULER_RUNNING with
> system_state != SYSTEM_BOOTING, and remove the rcu_scheduler_starting()
> call.

In this particular case, agreed, I could use system_state. But there are
other cases that must change behavior as soon as preemption can happen,
which is upon return from that call to user_mode_thread(). The update to
system_state doesn't happen until much later. So I don't get to remove
that rcu_scheduler_starting() call.

What case?

Here is one:

o The newly spawned init process does something that uses RCU,
but is preempted while holding rcu_read_lock().

o The boot thread, which did the preempting, waits for a grace
period. If we use rcu_scheduler_active, all is well because
synchronize_rcu() will do a real run-time grace period, thus
waiting for that reader.

But system_state has not yet been updated, so if synchronize_rcu()
were instead to pay attention to that one, there might be a
tragically too-short RCU grace period.

Thoughts?

Thanx, Paul

------------------------------------------------------------------------

commit 876c5ac113fa66a64fa241e69d9a2251b8daa5ee
Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
Date: Thu Dec 15 09:26:09 2022 -0800

rcu: Don't assert interrupts enabled too early in boot

The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check
that interrupts are enabled, as they normally should be when waiting for
an RCU grace period. Except that it is legal to wait for grace periods
during early boot, before interrupts have been enabled for the first time,
and polling for grace periods is required to work during this time.
This can result in false-positive lockdep splats in the presence of
boot-time-initiated tracing.

This commit therefore conditions those interrupts-enabled checks on
rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by
which time interrupts have been enabled.

Reported-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ee8a6a711719a..f627888715dca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1314,7 +1314,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap)
{
struct rcu_node *rnp = rcu_get_root();

- if (rcu_init_invoked())
+ if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
raw_lockdep_assert_held_rcu_node(rnp);

// If RCU was idle, note beginning of GP.
@@ -1330,7 +1330,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
{
struct rcu_node *rnp = rcu_get_root();

- if (rcu_init_invoked())
+ if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
raw_lockdep_assert_held_rcu_node(rnp);

// If the previously noted GP is still in effect, record the
@@ -1353,7 +1353,8 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
struct rcu_node *rnp = rcu_get_root();

if (rcu_init_invoked()) {
- lockdep_assert_irqs_enabled();
+ if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+ lockdep_assert_irqs_enabled();
raw_spin_lock_irqsave_rcu_node(rnp, flags);
}
rcu_poll_gp_seq_start(snap);
@@ -1369,7 +1370,8 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
struct rcu_node *rnp = rcu_get_root();

if (rcu_init_invoked()) {
- lockdep_assert_irqs_enabled();
+ if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+ lockdep_assert_irqs_enabled();
raw_spin_lock_irqsave_rcu_node(rnp, flags);
}
rcu_poll_gp_seq_end(snap);