Re: [PATCH v2 06/11] perf/uprobe: SRCU-ify uprobe->consumer list

From: Peter Zijlstra
Date: Mon Jul 15 2024 - 07:25:18 EST


On Fri, Jul 12, 2024 at 02:06:08PM -0700, Andrii Nakryiko wrote:
> + bpf@vger, please cc bpf ML for the next revision, these changes are
> very relevant there as well, thanks
>
> On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > With handle_swbp() hitting concurrently on (all) CPUs the
> > uprobe->register_rwsem can get very contended. Add an SRCU instance to
> > cover the consumer list and consumer lifetime.
> >
> > Since the consumer are externally embedded structures, unregister will
> > have to suffer a synchronize_srcu().
> >
> > A notably complication is the UPROBE_HANDLER_REMOVE logic which can
> > race against uprobe_register() such that it might want to remove a
> > freshly installer handler that didn't get called. In order to close
> > this hole, a seqcount is added. With that, the removal path can tell
> > if anything changed and bail out of the removal.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> > kernel/events/uprobes.c | 60 ++++++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 50 insertions(+), 10 deletions(-)
> >
>
> [...]
>
> > @@ -800,7 +808,7 @@ static bool consumer_del(struct uprobe *
> > down_write(&uprobe->consumer_rwsem);
> > for (con = &uprobe->consumers; *con; con = &(*con)->next) {
> > if (*con == uc) {
> > - *con = uc->next;
> > + WRITE_ONCE(*con, uc->next);
>
> I have a dumb and mechanical question.
>
> Above in consumer_add() you are doing WRITE_ONCE() for uc->next
> assignment, but rcu_assign_pointer() for uprobe->consumers. Here, you
> are doing WRITE_ONCE() for the same operation, if it so happens that
> uc == *con == uprobe->consumers. So is rcu_assign_pointer() necessary
> in consumer_addr()? If yes, we should have it here as well, no? And if
> not, why bother with it in consumer_add()?

add is a publish and needs to ensure all stores to the object are
ordered before the object is linked in. Note that rcu_assign_pointer()
is basically a fancy way of writing smp_store_release().

del otoh does not publish, it removes and doesn't need ordering.

It does however need to ensure the pointer write itself isn't torn. That
is, without the WRITE_ONCE() the compiler is free to do byte stores in
order to update the 8 byte pointer value (assuming 64bit). This is
pretty dumb, but very much permitted by C and also utterly fatal in the
case where an RCU iteration comes by and reads a half-half pointer
value.

> > ret = true;
> > break;
> > }
> > @@ -1139,9 +1147,13 @@ void uprobe_unregister(struct inode *ino
> > return;
> >
> > down_write(&uprobe->register_rwsem);
> > + raw_write_seqcount_begin(&uprobe->register_seq);
> > __uprobe_unregister(uprobe, uc);
> > + raw_write_seqcount_end(&uprobe->register_seq);
> > up_write(&uprobe->register_rwsem);
> > put_uprobe(uprobe);
> > +
> > + synchronize_srcu(&uprobes_srcu);
> > }
> > EXPORT_SYMBOL_GPL(uprobe_unregister);
>
> [...]