Re: [PATCH 0/8] uprobes: RCU-protected hot path optimizations
From: Andrii Nakryiko
Date: Wed Aug 07 2024 - 11:24:27 EST
On Wed, Aug 7, 2024 at 6:30 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> On 07/31, Andrii Nakryiko wrote:
> >
> > Andrii Nakryiko (6):
> > uprobes: revamp uprobe refcounting and lifetime management
> > uprobes: protected uprobe lifetime with SRCU
> > uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks
> > uprobes: travers uprobe's consumer list locklessly under SRCU
> > protection
> > uprobes: perform lockless SRCU-protected uprobes_tree lookup
> > uprobes: switch to RCU Tasks Trace flavor for better performance
> >
> > Peter Zijlstra (2):
> > rbtree: provide rb_find_rcu() / rb_find_add_rcu()
> > perf/uprobe: split uprobe_unregister()
>
> I see nothing wrong in 1-7. LGTM.
>
So, I don't know how it slipped the first time, because I tested
overnight with uprobe-stress, perhaps I adjusted some parameters (more
threads or different ratio of threads, not sure by now), but it turns
out that lockless RB-tree traversal actually crashes after a few
minutes of running uprobe-stress. I'll post details separately later
today, but I suspect that rb_find_rcu() and rb_find_add_rcu() is not
enough to make it safe.
Hopefully someone can help me figure this out.
> But since you are going to send the new version, I'd like to apply V2
> and then try to re-check the resulting code.
Yes, I was waiting for more of Peter's comments, but I guess I'll just
send a v2 today. I'll probably include the SRCU+timeout logic for
return_instances, and maybe lockless VMA parts as well. Those won't be
yet for landing, but it's probably useful to see everything
end-to-end.
Given the hiccup with lockless uprobes_tree lookup, we should land
that change, but the first 4 patches hopefully are in good enough
shape to apply and reduce the amount of patches that need to be
juggled around.
>
> As for 8/8 - I leave it to you and Peter. I'd prefer SRCU though ;)
Honestly curious, why the preference?
>
> Oleg.
>
BTW, while you are here :) What can you say about
current->sighand->siglock use in handle_singlestep()? Is there any way
to avoid that (or at least not have to take it every single time a
single-stepped uprobe is triggered?). This turned out to be a huge
issue for single-stepped uprobe when scaling to multiple CPUs even
with all the other things (all the SRCU and lockless VMA lookup) taken
care of.