Re: [PATCH v2 2/6] uprobes: protected uprobe lifetime with SRCU

From: Andrii Nakryiko
Date: Thu Aug 08 2024 - 13:52:20 EST


On Thu, Aug 8, 2024 at 9:58 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Thu, Aug 8, 2024 at 3:20 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> >
> > On 08/07, Andrii Nakryiko wrote:
> > >
> > > struct uprobe {
> > > - struct rb_node rb_node; /* node in the rb tree */
> > > + union {
> > > + struct rb_node rb_node; /* node in the rb tree */
> > > + struct rcu_head rcu; /* mutually exclusive with rb_node */
> >
> > Andrii, I am sorry.
> >
> > I suggested this in reply to 3/8 before I read
> > [PATCH 7/8] uprobes: perform lockless SRCU-protected uprobes_tree lookup
> >
> > I have no idea if rb_erase() is rcu-safe or not, but this union certainly
> > doesn't look right if we use rb_find_rcu/etc.
> >
>
> Ah, because put_uprobe() might be fast enough to remove uprobe from
> the tree, process delayed_uprobe_remove() and then enqueue
> uprobe_free_rcu() callback (which would use rcu field here,
> overwriting rb_node), while we are still doing a lockless lookup,
> finding this overwritten rb_node . Good catch, if that's the case (and
> I'm testing all this right now), then it's an easy fix.
>
> It would also explain why I initially didn't get any crashes for
> lockless RB-tree lookup with uprobe-stress (I was really surprised
> that I "missed" the crash initially).
>
> Thanks!

I can confirm that the crash went away. Previously it was crashing
after a few minutes, but now it's running for almost an hour with no
problem. Phew, I was worried there for a bit, but it seems like we are
back to the "everything is fine" state.

Okay, I'll incorporate this fix and synchronize_srcu() locally, will
give it a few more days, maybe Peter will want to take another look.
Will send a new revision early next week.

>
>
> > Yes, this version doesn't include the SRCU-protected uprobes_tree changes,
> > but still...
> >
> > Oleg.
> >