Re: single aio thread is migrated crazily by scheduler

From: Phil Auld
Date: Thu Nov 21 2019 - 09:21:55 EST


On Thu, Nov 21, 2019 at 02:29:37PM +0100 Peter Zijlstra wrote:
> On Wed, Nov 20, 2019 at 05:03:13PM -0500, Phil Auld wrote:
> > On Wed, Nov 20, 2019 at 08:16:36PM +0100 Peter Zijlstra wrote:
> > > On Tue, Nov 19, 2019 at 07:40:54AM +1100, Dave Chinner wrote:
>
> > > > Yes, that's precisely the problem - work is queued, by default, on a
> > > > specific CPU and it will wait for a kworker that is pinned to that
> > >
> > > I'm thinking the problem is that it doesn't wait. If it went and waited
> > > for it, active balance wouldn't be needed, that only works on active
> > > tasks.
> >
> > Since this is AIO I wonder if it should queue_work on a nearby cpu by
> > default instead of unbound.
>
> The thing seems to be that 'unbound' is in fact 'bound'. Maybe we should
> fix that. If the load-balancer were allowed to move the kworker around
> when it didn't get time to run, that would probably be a better
> solution.
>

Yeah, I'm not convinced this is actually a scheduler issue.


> Picking another 'bound' cpu by random might create the same sort of
> problems in more complicated scenarios.
>
> TJ, ISTR there used to be actually unbound kworkers, what happened to
> those? or am I misremembering things.
>
> > > Lastly,
> > > one other thing to try is -next. Vincent reworked the load-balancer
> > > quite a bit.
> > >
> >
> > I've tried it with the lb patch series. I get basically the same results.
> > With the high granularity settings I get 3700 migrations for the 30
> > second run at 4k. Of those about 3200 are active balance on stock 5.4-rc7.
> > With the lb patches it's 3500 and 3000, a slight drop.
>
> Thanks for testing that. I didn't expect miracles, but it is good to
> verify.
>
> > Using the default granularity settings 50 and 22 for stock and 250 and 25.
> > So a few more total migrations with the lb patches but about the same active.
>
> Right, so the granularity thing interacts with the load-balance period.
> By pushing it up, as some people appear to do, makes it so that what
> might be a temporal imablance is perceived as a persitent imbalance.
>
> Tying the load-balance period to the gramularity is something we could
> consider, but then I'm sure, we'll get other people complaining the
> doesn't balance quick enough anymore.
>

Thanks. These are old tuned settings that have been carried along. They may
not be right for newer kernels anyway.


--