Re: [PATCH] bdi: Fix another oops in wb_workfn()

From: Jan Kara
Date: Mon Jun 11 2018 - 05:12:56 EST


On Sat 09-06-18 23:00:05, Tetsuo Handa wrote:
> From 014c4149f2e24cd26b278b32d5dfda056eecf093 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Date: Sat, 9 Jun 2018 22:47:52 +0900
> Subject: [PATCH] bdi: Fix another oops in wb_workfn()
>
> syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
> wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was
> WB_shutting_down after wb->bdi->dev became NULL. This indicates that
> unregister_bdi() failed to call wb_shutdown() on one of wb objects.
>
> Since cgwb_bdi_unregister() from bdi_unregister() cannot call wb_shutdown()
> on wb objects which have already passed list_del_rcu() in wb_shutdown(),
> cgwb_bdi_unregister() from bdi_unregister() can return and set wb->bdi->dev
> to NULL before such wb objects enter final round of wb_workfn() via
> mod_delayed_work()/flush_delayed_work().

Thanks a lot for debugging the issue and also thanks a lot to Dmitry for
taking time to reproduce the race by hand with the debug patch! I really
appreciate it!

> Since WB_registered is already cleared by wb_shutdown(), only wb_shutdown()
> can schedule for final round of wb_workfn(). Since concurrent calls to
> wb_shutdown() on the same wb object is safe because of WB_shutting_down
> state, I think that wb_shutdown() can safely keep a wb object in the
> bdi->wb_list until that wb object leaves final round of wb_workfn().
> Thus, make wb_shutdown() call list_del_rcu() after flush_delayed_work().

However this is wrong and so is the patch. The problem is in
cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's
reference to wb structures before going through the list of wbs again and
calling wb_shutdown() on each of them. The writeback structures we are
accessing at this point can be already freed in principle like:

CPU1 CPU2
cgwb_bdi_unregister()
cgwb_kill(*slot);

cgwb_release()
queue_work(cgwb_release_wq, &wb->release_work);
cgwb_release_workfn()
wb = list_first_entry(&bdi->wb_list, ...)
spin_unlock_irq(&cgwb_lock);
wb_shutdown(wb);
...
kfree_rcu(wb, rcu);
wb_shutdown(wb); -> oops use-after-free

I'm not 100% sure how to fix this. wb structures can be at various phases of
shutdown (or there may be other external references still existing) when we
enter cgwb_bdi_unregister() so I think adding a way for cgwb_bdi_unregister()
to wait for standard wb shutdown path to finish is the most robust way.
What do you think about attached patch Tejun? So far only compile tested...

Possible problem with it is that now cgwb_bdi_unregister() will wait for
all wb references to be dropped so it adds some implicit dependencies to
bdi shutdown path.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR