Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

From: Lucas Stach
Date: Wed Apr 27 2016 - 14:31:50 EST


Am Dienstag, den 26.04.2016, 09:08 +1000 schrieb Dave Chinner:
[...]
> >
> > >
> > > That said, I'm not sure whether there's a notable benefit of
> > > idling
> > > for
> > > 50ms over just scheduling out when we've hit the target lsn. It
> > > seems
> > > like that anybody who pushes the target forward again is going to
> > > wake
> > > up the thread anyways. On the other hand, if the fs is idle the
> > > thread
> > > will eventually schedule out indefinitely.Â
> > Is this a problem? The patch tries to do exactly that: schedule out
> > aild indefinitely when there is no more work to do as nobody is
> > pushing
> > the target LSN forward.
> If the filesystem is slowly being dirtied, then the aild should't
> really idle at all.i
>
> Keep in mind that the xfsaild has multiple functions, one of which
> is a watchdog that catches log space stalls that would otherwise
> hang the filesystem. Every time we've removed the watchdog function
> (i.e.ÂÂagressively idle the aild) we've had users report random,
> unreproducable hangs/stalls that have gone away when the watchdog
> function (i.e. don't idle until the log is covered and completely
> idle) was re-instated...
>
I can only seeÂxfsaild_push() doing any work after it has hit the
target LSN if something moves the target LSN forward. You say that
aggressively idling aild might produce log stalls, which would imply
there are races in the code where a code path that moves the target LSN
forward doesn't properly wake up aild.

Wouldn't this problem also be present when doing non-aggressive idle of
aild, just the probability of hitting the issue being reduced
significantly? The commit that re-enabled non-aggressive aild idle
especially mentions some races that have been fixed and I think those
fixes should allow for agressive aild idle. If they are insufficient it
wouldn't be safe to idle aild at all, right?

Regards,
Lucas