Re: [PATCH] xfs: Wake CIL push waiters more reliably

From: Brian Foster
Date: Tue Feb 16 2021 - 06:20:28 EST


On Mon, Feb 15, 2021 at 02:36:38PM +0100, Donald Buczek wrote:
> On 13.01.21 22:53, Dave Chinner wrote:
> > [...]
> > I agree that a throttling fix is needed, but I'm trying to
> > understand the scope and breadth of the problem first instead of
> > jumping the gun and making the wrong fix for the wrong reasons that
> > just papers over the underlying problems that the throttling bug has
> > made us aware of...
>
> Are you still working on this?
>
> If it takes more time to understand the potential underlying problem, the fix for the problem at hand should be applied.
>
> This is a real world problem, accidentally found in the wild. It appears very rarely, but it freezes a filesystem or the whole system. It exists in 5.7 , 5.8 , 5.9 , 5.10 and 5.11 and is caused by c7f87f3984cf ("xfs: fix use-after-free on CIL context on shutdown") which silently added a condition to the wakeup. The condition is based on a wrong assumption.
>
> Why is this "papering over"? If a reminder was needed, there were better ways than randomly hanging the system.
>
> Why is
>
> if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))
> wake_up_all(&cil->xc_push_wait);
>
> , which doesn't work reliably, preferable to
>
> if (waitqueue_active(&cil->xc_push_wait))
> wake_up_all(&cil->xc_push_wait);
>
> which does?
>

JFYI, Dave followed up with a patch a couple weeks or so ago:

https://lore.kernel.org/linux-xfs/20210128044154.806715-5-david@xxxxxxxxxxxxx/

Brian

> Best
> Donald
>
> > Cheers,
> >
> > Dave
>