Re: [PATCH 1/3] uprobes: allow put_uprobe() from non-sleepable softirq context
From: Andrii Nakryiko
Date: Fri Oct 04 2024 - 16:28:04 EST
On Tue, Sep 17, 2024 at 1:19 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Sun, Sep 15, 2024 at 4:49 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> >
> > On 09/09, Andrii Nakryiko wrote:
> > >
> > > Currently put_uprobe() might trigger mutex_lock()/mutex_unlock(), which
> > > makes it unsuitable to be called from more restricted context like softirq.
> > >
> > > Let's make put_uprobe() agnostic to the context in which it is called,
> > > and use work queue to defer the mutex-protected clean up steps.
> >
> > ...
> >
> > > +static void uprobe_free_deferred(struct work_struct *work)
> > > +{
> > > + struct uprobe *uprobe = container_of(work, struct uprobe, work);
> > > +
> > > + /*
> > > + * If application munmap(exec_vma) before uprobe_unregister()
> > > + * gets called, we don't get a chance to remove uprobe from
> > > + * delayed_uprobe_list from remove_breakpoint(). Do it here.
> > > + */
> > > + mutex_lock(&delayed_uprobe_lock);
> > > + delayed_uprobe_remove(uprobe, NULL);
> > > + mutex_unlock(&delayed_uprobe_lock);
> > > +
> > > + kfree(uprobe);
> > > +}
> > > +
> > > static void uprobe_free_rcu(struct rcu_head *rcu)
> > > {
> > > struct uprobe *uprobe = container_of(rcu, struct uprobe, rcu);
> > >
> > > - kfree(uprobe);
> > > + INIT_WORK(&uprobe->work, uprobe_free_deferred);
> > > + schedule_work(&uprobe->work);
> > > }
> >
> > This is still wrong afaics...
> >
> > If put_uprobe() can be called from softirq (after the next patch), then
> > put_uprobe() and all other users of uprobes_treelock should use
> > write_lock_bh/read_lock_bh to avoid the deadlock.
>
> Ok, I see the problem, that's unfortunate.
>
> I see three ways to handle that:
>
> 1) keep put_uprobe() as is, and instead do schedule_work() from the
> timer thread to postpone put_uprobe(). (but I'm not a big fan of this)
> 2) move uprobes_treelock part of put_uprobe() into rcu callback, I
> think it has no bearing on correctness, uprobe_is_active() is there
> already to handle races between putting uprobe and removing it from
> uprobes_tree (I prefer this one over #1 )
> 3) you might like this the most ;) I think I can simplify
> hprobes_expire() from patch #2 to not need put_uprobe() at all, if I
> protect uprobe lifetime with non-sleepable
> rcu_read_lock()/rcu_read_unlock() and perform try_get_uprobe() as the
> very last step after cmpxchg() succeeded.
>
> I'm leaning towards #3, but #2 seems fine to me as well.
Ok, so just a short update. I don't think #3 works, I do need
try_get_uprobe() before I know for sure that cmpxchg() succeeds. Which
means I'd need a compensating put_uprobe() if cmpxchg() fails. So for
put_uprobe(), I just made it do all the locking in deferred work
callback (which is #2 above), which I think resolved the issue you
pointed out with potential deadlock and removes any limitations on
put_uprobe().
Also, I rewrote the hprobe_consume() and hprobe_expire() in terms of
an explicit state machine with 4 possible states (LEASED, STABLE,
GONE, CONSUMED), which I think makes the logic a bit more
straightforward to follow. Hopefully that will make the change more
palatable for you. I'm probably going to post patches next week,
though.
>
> >
> > To be honest... I simply can't force myself to even try to read 2/3 ;) I'll
> > try to do this later, but I am sure I will never like it, sorry.
>
> This might sound rude, but the goal here is not to make you like it :)
> The goal is to improve performance with minimal complexity. And I'm
> very open to any alternative proposals as to how to make uretprobes
> RCU-protected to avoid refcounting in the hot path.
>
> I think #3 proposal above will make it a bit more palatable (but there
> is still locklessness, cmpxchg, etc, I see no way around that,
> unfortunately).
>
> >
> > Oleg.
> >