Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock
From: Josh Don
Date: Tue Apr 27 2021 - 19:32:15 EST
On Mon, Apr 26, 2021 at 3:21 PM Josh Don <joshdon@xxxxxxxxxx> wrote:
>
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f732642e3e09..1a81e9cc9e5d 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -290,6 +290,10 @@ static void sched_core_assert_empty(void)
> > static void __sched_core_enable(void)
> > {
> > static_branch_enable(&__sched_core_enabled);
> > + /*
> > + * Ensure raw_spin_rq_*lock*() have completed before flipping.
> > + */
> > + synchronize_sched();
>
> synchronize_rcu()
>
> > __sched_core_flip(true);
> > sched_core_assert_empty();
> > }
> > @@ -449,16 +453,22 @@ void raw_spin_rq_lock_nested(struct rq *rq, int subclass)
> > {
> > raw_spinlock_t *lock;
> >
> > + preempt_disable();
> > if (sched_core_disabled()) {
> > raw_spin_lock_nested(&rq->__lock, subclass);
> > + /* preempt *MUST* still be disabled here */
> > + preempt_enable_no_resched();
> > return;
> > }
>
> This approach looks good to me. I'm guessing you went this route
> instead of doing the re-check after locking in order to optimize the
> disabled case?
>
> Recommend a comment that the preempt_disable() here pairs with the
> synchronize_rcu() in __sched_core_enable().
>
> >
> > for (;;) {
> > lock = __rq_lockp(rq);
> > raw_spin_lock_nested(lock, subclass);
> > - if (likely(lock == __rq_lockp(rq)))
> > + if (likely(lock == __rq_lockp(rq))) {
> > + /* preempt *MUST* still be disabled here */
> > + preempt_enable_no_resched();
> > return;
> > + }
> > raw_spin_unlock(lock);
> > }
> > }
Also, did you mean to have a preempt_enable_no_resched() rather than
prempt_enable() in raw_spin_rq_trylock?
I went over the rq_lockp stuff again after Don's reported lockup. Most
uses are safe due to already holding an rq lock. However,
double_rq_unlock() is prone to race:
double_rq_unlock(rq1, rq2):
/* Initial state: core sched enabled, and rq1 and rq2 are smt
siblings. So, double_rq_lock(rq1, rq2) only took a single rq lock */
raw_spin_rq_unlock(rq1);
/* now not holding any rq lock */
/* sched core disabled. Now __rq_lockp(rq1) != __rq_lockp(rq2), so we
falsely unlock rq2 */
if (__rq_lockp(rq1) != __rq_lockp(rq2))
raw_spin_rq_unlock(rq2);
else
__release(rq2->lock);
Instead we can cache __rq_lockp(rq1) and __rq_lockp(rq2) before
releasing the lock, in order to prevent this. FWIW I think it is
likely that Don is seeing a different issue.