Re: [PATCH tip/core/rcu 1/4] rcu: Eliminate BUG_ON() for sync.c

From: Paul E. McKenney
Date: Sun Nov 11 2018 - 21:20:18 EST


On Sun, Nov 11, 2018 at 09:07:04PM -0500, Steven Rostedt wrote:
> On Sun, 11 Nov 2018 11:32:14 -0800
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxx> wrote:
>
> > The sync.c file has a number of calls to BUG_ON(), which panics the
> > kernel, which is not a good strategy for devices (like embedded) that
> > don't have a way to capture console output. This commit therefore
> > changes these BUG_ON() calls to WARN_ON_ONCE(), but does so quite naively.
> >
> > Reported-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
> > Acked-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > ---
> > kernel/rcu/sync.c | 13 ++++++-------
> > 1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > index 3f943efcf61c..a6ba446a9693 100644
> > --- a/kernel/rcu/sync.c
> > +++ b/kernel/rcu/sync.c
> > @@ -125,8 +125,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > rsp->gp_state = GP_PENDING;
> > spin_unlock_irq(&rsp->rss_lock);
> >
> > - BUG_ON(need_wait && need_sync);
> > -
> > + WARN_ON_ONCE(need_wait && need_sync);
> > if (need_sync) {
> > gp_ops[rsp->gp_type].sync();
> > rsp->gp_state = GP_PASSED;
> > @@ -139,7 +138,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > * Nobody has yet been allowed the 'fast' path and thus we can
> > * avoid doing any sync(). The callback will get 'dropped'.
> > */
> > - BUG_ON(rsp->gp_state != GP_PASSED);
> > + WARN_ON_ONCE(rsp->gp_state != GP_PASSED);
> > }
> > }
> >
> > @@ -166,8 +165,8 @@ static void rcu_sync_func(struct rcu_head *rhp)
> > struct rcu_sync *rsp = container_of(rhp, struct rcu_sync, cb_head);
> > unsigned long flags;
> >
> > - BUG_ON(rsp->gp_state != GP_PASSED);
> > - BUG_ON(rsp->cb_state == CB_IDLE);
> > + WARN_ON_ONCE(rsp->gp_state != GP_PASSED);
> > + WARN_ON_ONCE(rsp->cb_state == CB_IDLE);
> >
> > spin_lock_irqsave(&rsp->rss_lock, flags);
> > if (rsp->gp_count) {
> > @@ -225,7 +224,7 @@ void rcu_sync_dtor(struct rcu_sync *rsp)
> > {
> > int cb_state;
> >
> > - BUG_ON(rsp->gp_count);
> > + WARN_ON_ONCE(rsp->gp_count);
> >
> > spin_lock_irq(&rsp->rss_lock);
> > if (rsp->cb_state == CB_REPLAY)
> > @@ -235,6 +234,6 @@ void rcu_sync_dtor(struct rcu_sync *rsp)
> >
> > if (cb_state != CB_IDLE) {
> > gp_ops[rsp->gp_type].wait();
> > - BUG_ON(rsp->cb_state != CB_IDLE);
> > + WARN_ON_ONCE(rsp->cb_state != CB_IDLE);
> > }
> > }
>
> I take it that if any of these WARN_ON_ONCE() triggers, they wont cause
> immediate catastrophe, and/or there's no gentle way out like you have
> with the other patches exiting the function when one is hit.

Oleg was actually OK with removing them entirely:

"I added these BUG_ON's for documentation when I was prototyping
this code, perhaps we can simply remove them."

And they are "cannot happen" types of things (famous last words).
Oleg also has another approach that could rip-and-replace the current
implementation, which would render these WARN*()s moot.

Thanx, Paul