On Fri, Sep 15, 2017 at 04:44:38PM +0530, Neeraj Upadhyay wrote:Hi Paul, how about replacing raw_spin_trylock_irqsave with
Hi,Good catch and good detective work!!!
We have one query regarding the behavior of RCU expedited grace period,
for scenario where resched_cpu() in sync_sched_exp_handler() fails to
acquire the rq lock and returns w/o setting the need_resched. In this
case, how do we ensure that the CPU notify rcu about the
end of sched grace period (schedule() -> __schedule() ->
rcu_note_context_switch(cpu) -> rcu_sched_qs()) , for cases where tick
is stopped on that CPU. Is it implied from the rq lock acquisition
failure, that the owner of the rq lock will enforce context switch?
For which scenarios in RCU paths (as the function is used only in RCU
code), we need trylock check in resched_cpu()?
void resched_cpu(int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
if (!raw_spin_trylock_irqsave(&rq->lock, flags))
This issue was observed in below scenario, where one of the CPUs (CPU1)
started synchronize_sched_expedited and sent IPI to CPU5, which is in
the idle path but handled sync_sched_exp_handler() IPI before
As resched_cpu() failed to acquire the rq lock, need_resched was not set,
and CPU went to idle; resulting in expedited stall getting reported
Below is the scenario:
â CPU1 is waiting for expedited wait to complete:
rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5
IPI sent to CPU5
ret = swait_event_timeout(
expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
â CPU5 handles IPI and fails to acquire rq lock.
returns while failing to try lock acquire rq->lock
need_resched is not set
â CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to
idle (schedule() is not called).
â CPU 1 reports RCU stall.
I will be working on a fix this week, hopefully involving resched_cpu()
getting a return value so that I can track who needs a later retry.