Re: Query regarding synchronize_sched_expedited and resched_cpu
From: Paul E. McKenney
Date: Sat Sep 16 2017 - 21:00:36 EST
On Fri, Sep 15, 2017 at 04:44:38PM +0530, Neeraj Upadhyay wrote:
> We have one query regarding the behavior of RCU expedited grace period,
> for scenario where resched_cpu() in sync_sched_exp_handler() fails to
> acquire the rq lock and returns w/o setting the need_resched. In this
> case, how do we ensure that the CPU notify rcu about the
> end of sched grace period (schedule() -> __schedule() ->
> rcu_note_context_switch(cpu) -> rcu_sched_qs()) , for cases where tick
> is stopped on that CPU. Is it implied from the rq lock acquisition
> failure, that the owner of the rq lock will enforce context switch?
> For which scenarios in RCU paths (as the function is used only in RCU
> code), we need trylock check in resched_cpu()?
> void resched_cpu(int cpu)
> struct rq *rq = cpu_rq(cpu);
> unsigned long flags;
> if (!raw_spin_trylock_irqsave(&rq->lock, flags))
> raw_spin_unlock_irqrestore(&rq->lock, flags);
> This issue was observed in below scenario, where one of the CPUs (CPU1)
> started synchronize_sched_expedited and sent IPI to CPU5, which is in
> the idle path but handled sync_sched_exp_handler() IPI before
> As resched_cpu() failed to acquire the rq lock, need_resched was not set,
> and CPU went to idle; resulting in expedited stall getting reported
> by CPU1.
> Below is the scenario:
> â CPU1 is waiting for expedited wait to complete:
> rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
> IPI sent to CPU5
> ret = swait_event_timeout(
> expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
> â CPU5 handles IPI and fails to acquire rq lock.
> Handles IPI
> returns while failing to try lock acquire rq->lock
> need_resched is not set
> â CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
> idle (schedule() is not called).
> â CPU 1 reports RCU stall.
Good catch and good detective work!!!
I will be working on a fix this week, hopefully involving resched_cpu()
getting a return value so that I can track who needs a later retry.