Re: [RESEND][v2][PATCH] Fix a race between try_to_wake_up() and a woken up task
From: Peter Zijlstra
Date: Thu Sep 08 2016 - 04:58:40 EST
On Mon, Sep 05, 2016 at 05:14:19PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2016-09-05 at 13:16 +1000, Balbir Singh wrote:
>
> �.../...
> >
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Nicholas Piggin <npiggin@xxxxxxxxx>
>
> Acked-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
>
> > Signed-off-by: Balbir Singh <bsingharora@xxxxxxxxx>
> > ---
> > �kernel/sched/core.c | 11 +++++++++++
> > �1 file changed, 11 insertions(+)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2a906f2..582c684 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned
> > int state, int wake_flags)
> > � success = 1; /* we're going to change ->state */
> > � cpu = task_cpu(p);
> > �
> > + /*
> > + �* Ensure we see on_rq and p_state consistently
> > + �*
> > + �* For example in __rwsem_down_write_failed(), we have
> > + �*����[S] ->on_rq = 1 [L] ->state
> > + �*����MB �RMB
> > + �*����[S] ->state = TASK_UNINTERRUPTIBLE [L] ->on_rq
> > + �* In the absence of the RMB p->on_rq can be observed to be 0
> > + �* and we end up spinning indefinitely in while (p->on_cpu)
> > + �*/
So I did replace that comment with the one I proposed earlier. I checked
a fair number of architectures and many did not have an obvious barrier
in switch_to(). So that is not something we can rely on, nor do we need
to I think.
> > + smp_rmb();
> > � if (p->on_rq && ttwu_remote(p, wake_flags))
> > � goto stat;
> > �