Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()

From: Paul E. McKenney
Date: Tue Oct 27 2015 - 01:33:26 EST


On Tue, Oct 27, 2015 at 02:19:39PM +0900, Tejun Heo wrote:
> Hello,
>
> On Tue, Oct 27, 2015 at 12:37:16PM +0900, Linus Torvalds wrote:
> > > I believe that the above should instead be:
> > >
> > > struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,
>
> I should have just used list_entry() here. It's just offseting the
> pointer to set up the initial iteration point.

OK, that sounds much better!

> ...
> > That said, I'm not sure why it doesn't just do the normal
> >
> > rcu_read_lock();
> > list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
> > ....
> > }
> > rcu_read_unlock();
> >
> > like the other places do. It looks like it wants that
> > "list_for_each_entry_continue_rcu()" because it does that odd "pin
> > entry and drop rcu lock and retake it and continue where you left
> > off", but I'm not sure why the continue version would be so
> > different.. It's going to do that "follow next entry" regardless, and
> > the "goto restart" doesn't look like it actually adds anything. If
> > following the next pointer is ok even after having released the RCU
> > read lock, then I'm not seeing why the end of the loop couldn't just
> > do
> >
> > rcu_read_unlock();
> > wb_wait_for_completion(bdi, &fallback_work_done);
> > rcu_read_lock();
> >
> > and just continue the loop (and the pinning of "wb" and releasing the
> > "last_wb" thing in the *next* iteration should make it all work the
> > same).
> >
> > Adding Tejun to the cc, because this is his code and there's probably
> > something subtle I'm missing. Tejun, can you take a look? It's
> > bdi_split_work_to_wbs() in fs/fs-writeback.c.
>
> Yeah, just releasing and regrabbing should work too as the iterator
> doesn't depend on anything other than the current entry (e.g. as
> opposed to imaginary list_for_each_entry_safe_rcu()). It's slightly
> icky to meddle with locking behind the iterator's back tho. Either
> way should be fine but how about something like the following?
>
> Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
>
> bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to
> walk @bdi->wb_list. To set up the initial iteration condition, it
> uses list_entry_rcu() to calculate the entry pointer corresponding to
> the list head; however, this isn't an actual RCU dereference and using
> list_entry_rcu() for it ended up breaking a proposed list_entry_rcu()
> change because it was feeding an non-lvalue pointer into the macro.
>
> Don't use the RCU variant for simple pointer offsetting. Use
> list_entry() instead.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>

Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

> ---
> fs/fs-writeback.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 29e4599..7378169 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
> bool skip_if_busy)
> {
> struct bdi_writeback *last_wb = NULL;
> - struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
> - struct bdi_writeback, bdi_node);
> + struct bdi_writeback *wb = list_entry(&bdi->wb_list,
> + struct bdi_writeback, bdi_node);
>
> might_sleep();
> restart:
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/