Re: [PATCH 1/2] sched: Fix balance_callback()

From: peterz
Date: Fri Sep 11 2020 - 13:45:32 EST


On Fri, Sep 11, 2020 at 01:17:02PM +0100, Valentin Schneider wrote:
> On 11/09/20 09:17, Peter Zijlstra wrote:
> > The intent of balance_callback() has always been to delay executing
> > balancing operations until the end of the current rq->lock section.
> > This is because balance operations must often drop rq->lock, and that
> > isn't safe in general.
> >
> > However, as noted by Scott, there were a few holes in that scheme;
> > balance_callback() was called after rq->lock was dropped, which means
> > another CPU can interleave and touch the callback list.
> >
>
> So that can be say __schedule() tail racing with some setprio; what's the
> worst that can (currently) happen here? Something like say two consecutive
> enqueuing of push_rt_tasks() to the callback list?

Yeah, but that isn't in fact the case I worry most about.

What can happen (and what I've spotted once before) is that someone
attempts to enqueue a balance_callback from a rq->lock region that
doesn't handle the calls.

Currently that 'works', that is, it will get ran _eventually_. But
ideally we'd want that to not work and issue a WARN. We want the
callbacks to be timely.

So basically all of these machinations we in order to add the WARN :-)