Re: sched_setscheduler() vs idle_balance() race

From: Mike Galbraith
Date: Thu May 28 2015 - 08:04:34 EST


On Thu, 2015-05-28 at 13:51 +0200, Peter Zijlstra wrote:
> On Thu, May 28, 2015 at 09:43:52AM +0200, Mike Galbraith wrote:
> > Hi Peter,
> >
> > I'm not seeing what prevents pull_task() from yanking a task out from
> > under __sched_setscheduler(). A box sprinkling smoldering 3.0 kernel
> > wreckage all over my bugzilla mbox isn't seeing it either ;-)
> >
> > Scenario: rt task forks, wakes child to CPU foo, immediately tries to
> > change child to fair class, calls switched_from_rt(), that leads to
> > pull_rt_task() -> double_lock_balance() which momentarily drops child's
> > rq->lock, letting some prick doing idle balancing over on CPU bar in to
> > migrate the child. Rt parent then calls switched_to_fair(), and box
> > explodes when we use the passed rq as if the child still lived there.
> >
> > I sent a patchlet to verify that the diagnosis is really really correct
> > (can_migrate_task() says no if ->pi_lock is held), but I think it is,
> > the 8x10 color glossy with circles and arrows clearly shows both tasks
> > with their grubby mitts on that child at the same time, each thinking it
> > has that child locked down tight.
> >
> > Not seeing what should prevent that in mainline either, I'll just ask
> > while I wait to (hopefully) hear "yup, all better".
>
> The last patch to come close is 67dfa1b756f2 ("sched/deadline: Implement
> cancel_dl_timer() to use in switched_from_dl()")
>
> Which places the comment /* Possible rq-lock hole */ between
> switched_from() and switched_to().
>
> Which is exactly the hole you mean, right?

Yeah, but that hole is way older than dl. Box falling into it is
running SLE11, which is.. well, still somewhat resembles 3.0.

> And that commit talks about how all that is 'safe' because all scheduler
> operations take ->pi_lock, which is true, except for load-balancing,
> which only uses rq->lock.

Yes. The child CPU scheduled, so child was no longer ->curr, making it
eligible given !hot or too many failed attempts.

Oh, btw, we pull tasks that are about to schedule off too. While first
trying to out wth was going on, I sprinkled some checks, and "NOPE, why
bother" is the only one to appear, quite a lot.

> Furthermore, we call check_class_changed() _after_ we enqueue the task
> on the new class, so balancing can indeed occur.
>
> Lemme go stare at this; ideally we'd call check_class_changed() at
> __setscheduler() time where the task is off all rqs, but I suspect
> there's 'obvious' problems with that..

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/