Re: INFO: rcu detected stall in smp_call_function

From: Paul E. McKenney
Date: Wed Aug 26 2020 - 10:07:47 EST


On Wed, Aug 26, 2020 at 11:51:44AM +0200, peterz@xxxxxxxxxxxxx wrote:
> On Tue, Aug 25, 2020 at 08:48:41AM -0700, Paul E. McKenney wrote:
>
> > > Paul, I wanted to use this function, but found it has very weird
> > > semantics.
> > >
> > > Why do you need it to (remotely) call @func when p is current? The user
> > > in rcu_print_task_stall() explicitly bails in this case, and the other
> > > in rcu_wait_for_one_reader() will attempt an IPI.
> >
> > Good question. Let me look at the invocations:
> >
> > o trc_wait_for_one_reader() bails on current before
> > invoking try_invoke_on_locked_down_task():
> >
> > if (t == current) {
> > t->trc_reader_checked = true;
> > trc_del_holdout(t);
> > WARN_ON_ONCE(t->trc_reader_nesting);
> > return;
> > }
> >
> > o rcu_print_task_stall() might well invoke on the current task,
> > low though the probability of this happening might be. (The task
> > has to be preempted within an RCU read-side critical section
> > and resume in time for the scheduling-clock irq that will report
> > the RCU CPU stall to interrupt it.)
> >
> > And you are right, no point in an IPI in this case.
> >
> > > Would it be possible to change this function to:
> > >
> > > - blocked task: call @func with p->pi_lock held
> > > - queued, !running task: call @func with rq->lock held
> > > - running task: fail.
> > >
> > > ?
> >
> > Why not a direct call in the current-task case, perhaps as follows,
> > including your change above? This would allow the RCU CPU stall
> > case to work naturally and without the IPI.
> >
> > Would that work for your use case?
>
> It would in fact, but at this point I'd almost be inclined to stick the
> IPI in as well. But small steps I suppose. So yes.

If interrupts are disabled, won't a self-IPI deadlock?

But good point that the current-task case could be the only case invoking
the function with interrupts enabled, which now that you mention it does
sound like an accident waiting to happen. So how about the following
instead?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8471a0f..f8ed04c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2997,7 +2997,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* state while invoking @func(@arg). This function can use ->on_rq and
* task_curr() to work out what the state is, if required. Given that
* @func can be invoked with a runqueue lock held, it had better be quite
- * lightweight.
+ * lightweight. Note that the current task is implicitly locked down.
*
* Returns:
* @false if the task slipped out from under the locks.
@@ -3006,12 +3006,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
*/
bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
{
- bool ret = false;
struct rq_flags rf;
+ bool ret = false;
struct rq *rq;

- lockdep_assert_irqs_enabled();
- raw_spin_lock_irq(&p->pi_lock);
+ if (p == current) {
+ local_irq_save(rf.flags);
+ ret = func(p, arg);
+ local_irq_restore(rf.flags);
+ return ret;
+ }
+ raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
if (p->on_rq) {
rq = __task_rq_lock(p, &rf);
if (task_rq(p) == rq)
@@ -3028,7 +3033,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
ret = func(p, arg);
}
}
- raw_spin_unlock_irq(&p->pi_lock);
+ raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
return ret;
}