Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
From: Masami Hiramatsu
Date: Thu Apr 09 2020 - 10:26:57 EST
Hi,
On Thu, 09 Apr 2020 18:46:47 +0530
"Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxx> wrote:
> Hi Masami,
>
> Masami Hiramatsu wrote:
> > Hi Jiri,
> >
> > On Wed, 8 Apr 2020 18:46:41 +0200
> > Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> >
> >> hi,
> >> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> >
> > Hmm, kprobe is lockless, but kretprobe involves spinlock.
> > Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> > for kretprobe.
>
> As far as I can see, this is the only place where probing
> _raw_spin_lock_irqsave() is an issue. Should we blacklist all users for
> this case alone?
Hrm, right. kretprobe is different from kprobe's case.
>
> > If you need to trace spinlock return, please consider to putting
> > kprobe at "ret" instruction.
> >
> >> My test was also able to trigger lockdep output:
> >>
> >> ============================================
> >> WARNING: possible recursive locking detected
> >> 5.6.0-rc6+ #6 Not tainted
> >> --------------------------------------------
> >> sched-messaging/2767 is trying to acquire lock:
> >> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> >>
> >> but task is already holding lock:
> >> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> >>
> >> other info that might help us debug this:
> >> Possible unsafe locking scenario:
> >>
> >> CPU0
> >> ----
> >> lock(&(kretprobe_table_locks[i].lock));
> >> lock(&(kretprobe_table_locks[i].lock));
> >>
> >> *** DEADLOCK ***
> >>
> >> May be due to missing lock nesting notation
> >>
> >> 1 lock held by sched-messaging/2767:
> >> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> >>
> >> stack backtrace:
> >> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
> >> Call Trace:
> >> dump_stack+0x96/0xe0
> >> __lock_acquire.cold.57+0x173/0x2b7
> >> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
> >> ? lockdep_hardirqs_on+0x590/0x590
> >> ? __lock_acquire+0xf63/0x4030
> >> lock_acquire+0x15a/0x3d0
> >> ? kretprobe_hash_lock+0x52/0xa0
> >> _raw_spin_lock_irqsave+0x36/0x70
> >> ? kretprobe_hash_lock+0x52/0xa0
> >> kretprobe_hash_lock+0x52/0xa0
> >> trampoline_handler+0xf8/0x940
> >> ? kprobe_fault_handler+0x380/0x380
> >> ? find_held_lock+0x3a/0x1c0
> >> kretprobe_trampoline+0x25/0x50
> >> ? lock_acquired+0x392/0xbc0
> >> ? _raw_spin_lock_irqsave+0x50/0x70
> >> ? __get_valid_kprobe+0x1f0/0x1f0
> >> ? _raw_spin_unlock_irqrestore+0x3b/0x40
> >> ? finish_task_switch+0x4b9/0x6d0
> >> ? __switch_to_asm+0x34/0x70
> >> ? __switch_to_asm+0x40/0x70
> >>
> >> The code within the kretprobe handler checks for probe reentrancy,
> >> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> >>
> >> The problem is in outside kprobe_flush_task, where we call:
> >>
> >> kprobe_flush_task
> >> kretprobe_table_lock
> >> raw_spin_lock_irqsave
> >> _raw_spin_lock_irqsave
> >>
> >> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> >> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> >
> > Hmm, OK. In this case, I think we should mark this process is
> > going to die and never try to kretprobe on it.
> >
> >>
> >> The kretprobe_trampoline handler is then executed with already
> >> locked kretprobe_table_locks, and first thing it does is to
> >> lock kretprobe_table_locks ;-) the whole lockup path like:
> >>
> >> kprobe_flush_task
> >> kretprobe_table_lock
> >> raw_spin_lock_irqsave
> >> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> >>
> >> ---> kretprobe_table_locks locked
> >>
> >> kretprobe_trampoline
> >> trampoline_handler
> >> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
> >>
> >> The change below sets current_kprobe in kprobe_flush_task, so the probe
> >> recursion protection check is hit and the probe is never set. It seems
> >> to fix the deadlock.
> >>
> >> I'm not sure this is the best fix, any ideas are welcome ;-)
> >
> > Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> > kprobes (and kretprobe) on an area by filling current_kprobe might
> > be a good idea, but it also involves other kprobes.
>
> Not sure how you mean that. Jiri's RFC patch would be disabling
> k[ret]probes within kprobe_flush_task(), which is only ever invoked from
> finish_task_switch(). I only see calls to spin locks and kfree() from
> here. Besides, kprobe_flush_task() itself is NOKPROBE, so we would
> ideally want to not trace/probe other functions it calls.
Ah, good point. I forgot that has been blacklisted. OK. then I accept
the Jiri's RFC.
Thank you,
>
> >
> > How about let kretprobe skip the task which state == TASK_DEAD ?
> >
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 627fc1b7011a..3f207d2e0afb 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> > * To avoid deadlocks, prohibit return probing in NMI contexts,
> > * just skip the probe and increase the (inexact) 'nmissed'
> > * statistical counter, so that the user is informed that
> > - * something happened:
> > + * something happened.
> > + * Also, if the current task is dead, we will already in the process
> > + * to reclaim kretprobe instances from hash list. To avoid memory
> > + * leak, skip to run the kretprobe on such task.
> > */
> > - if (unlikely(in_nmi())) {
> > + if (unlikely(in_nmi()) || current->state == TASK_DEAD) {
>
> I'm wondering if this actually works. kprobe_flush_task() seems to be
> called from finish_task_switch(), after the task switch is complete. So,
> current task won't actually be dead here.
>
>
> - Naveen
>
--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>