Re: [PATCH] percpu_ref: call wake_up_all() after percpu_ref_put() completes
From: Tejun Heo
Date: Fri Apr 08 2022 - 13:41:14 EST
Hello,
On Thu, Apr 07, 2022 at 06:33:35PM +0800, Qi Zheng wrote:
> In the percpu_ref_call_confirm_rcu(), we call the wake_up_all()
> before calling percpu_ref_put(), which will cause the value of
> percpu_ref to be unstable when percpu_ref_switch_to_atomic_sync()
> returns.
>
> CPU0 CPU1
>
> percpu_ref_switch_to_atomic_sync(&ref)
> --> percpu_ref_switch_to_atomic(&ref)
> --> percpu_ref_get(ref); /* put after confirmation */
> call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu);
>
> percpu_ref_switch_to_atomic_rcu
> --> percpu_ref_call_confirm_rcu
> --> data->confirm_switch = NULL;
> wake_up_all(&percpu_ref_switch_waitq);
>
> /* here waiting to wake up */
> wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch);
> (A)percpu_ref_put(ref);
> /* The value of &ref is unstable! */
> percpu_ref_is_zero(&ref)
> (B)percpu_ref_put(ref);
>
> As shown above, assuming that the counts on each cpu add up to 0 before
> calling percpu_ref_switch_to_atomic_sync(), we expect that after switching
> to atomic mode, percpu_ref_is_zero() can return true. But actually it will
> return different values in the two cases of A and B, which is not what
> we expected.
>
> Maybe the original purpose of percpu_ref_switch_to_atomic_sync() is
> just to ensure that the conversion to atomic mode is completed, but it
> should not return with an extra reference count.
>
> Calling wake_up_all() after percpu_ref_put() ensures that the value of
> percpu_ref is stable after percpu_ref_switch_to_atomic_sync() returns.
> So just do it.
>
> Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
> ---
> lib/percpu-refcount.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index af9302141bcf..b11b4152c8cd 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -154,13 +154,14 @@ static void percpu_ref_call_confirm_rcu(struct rcu_head *rcu)
>
> data->confirm_switch(ref);
> data->confirm_switch = NULL;
> - wake_up_all(&percpu_ref_switch_waitq);
>
> if (!data->allow_reinit)
> __percpu_ref_exit(ref);
>
> /* drop ref from percpu_ref_switch_to_atomic() */
> percpu_ref_put(ref);
> +
> + wake_up_all(&percpu_ref_switch_waitq);
The interface, at least originally, doesn't give any guarantee over whether
there's gonna be a residual reference on it or not. There's nothing
necessarily wrong with guaranteeing that but it's rather unusual and given
that putting the base ref in a percpu_ref is a special "kill" operation and
a ref in percpu mode always returns %false on is_zero(), I'm not quite sure
how such semantics would be useful. Do you care to explain the use case with
concrete examples?
Also, the proposed patch is racy. There's nothing preventing
percpu_ref_switch_to_atomic_sync() from waking up early between
confirm_switch clearing and the wake_up_all, so the above change doesn't
guarantee what it tries to guarantee. For that, you'd have to move
confirm_switch clearing *after* percpu_ref_put() but then, you'd be
accessing the ref after its final ref is put which can lead to
use-after-free.
In fact, the whole premise seems wrong. The switching needs a reference to
the percpu_ref because it is accessing it asynchronously. The switching side
doesn't know when the ref is gonna go away once it puts its reference and
thus can't signal that they're done after putting their reference.
We *can* make that work by putting the whole thing in its own critical
section so that we can make confirm_switch clearing atomic with the possibly
final put, but that's gonna add some complexity and begs the question why
we'd need such a thing.
Andrew, I don't think the patch as proposed makes much sense. Maybe it'd be
better to keep it out of the tree for the time being?
Thanks.
--
tejun