Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
From: Masami Hiramatsu
Date: Mon Aug 24 2020 - 14:15:10 EST
Hi Peter,
On Mon, 24 Aug 2020 16:14:29 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, Aug 24, 2020 at 12:02:58PM +0000, Eddy_Wu@xxxxxxxxxxxxxx wrote:
> > After bisecting, I found this behavior seems to introduce by this
> > commit: (5.8-rc1) 0d00449c7a28a1514595630735df383dec606812 x86:
> > Replace ist_enter() with nmi_enter() This make kprobe_int3_handler()
> > effectively running as NMI context, which pre_handler_kretprobe()
> > explicitly checked to prevent recursion.
> >
> > (in_nmi() check appears from v3.17)
> > f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit
> > in NMI context to avoid deadlock
> >
> > To make kretprobe work again with int3 breakpoint, I think we can
> > replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at
> > kprobe_int3_handler() and skip kretprobe if nested NMI. Did a quick
> > test on 5.9-rc2 and it seems to be working. I'm not sure if it is the
> > best way to do since it may also require change to other architecture
> > as well, any thought?
>
> Masami, would it be possible to have a kretprobe specific recursion
> count here?
Hmm, good point. As I commented in f96f56780ca5 ("kprobes: Skip kretprobe
hit in NMI context to avoid deadlock"), this check is for avoiding the
deadlock with kretprobe_table_lock which is used in pre_handler *and*
kretprobe trampoline handler.
>
> I did the below, but i'm not at all sure that isn't horrible broken. I
> can't really find many rp->lock sites and this might break things by
> limiting contention.
This is not enough. For checking the recursion of kretprobes, we might
need kretprobe_table_trylock() or kretprobe_table_busy() (but both
can be false positive)
Note that rp->lock shouldn't matter unless we will support recursive
kprobe itself. (even though, we can use raw_spin_trylock_irqsave())
Thank you,
>
> ---
>
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9be1bff4f586..0bff314cc800 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -153,6 +153,7 @@ struct kretprobe {
> size_t data_size;
> struct hlist_head free_instances;
> raw_spinlock_t lock;
> + atomic_t recursion;
> };
>
> struct kretprobe_instance {
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 287b263c9cb9..27fd096bcb9a 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1934,22 +1934,17 @@ unsigned long __weak arch_deref_entry_point(void *entry)
> static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> {
> struct kretprobe *rp = container_of(p, struct kretprobe, kp);
> - unsigned long hash, flags = 0;
> struct kretprobe_instance *ri;
> -
> - /*
> - * To avoid deadlocks, prohibit return probing in NMI contexts,
> - * just skip the probe and increase the (inexact) 'nmissed'
> - * statistical counter, so that the user is informed that
> - * something happened:
> - */
> - if (unlikely(in_nmi())) {
> - rp->nmissed++;
> - return 0;
> - }
> + unsigned long hash, flags;
> + int rec;
>
> /* TODO: consider to only swap the RA after the last pre_handler fired */
> hash = hash_ptr(current, KPROBE_HASH_BITS);
> + rec = atomic_fetch_inc_acquire(&rp->recursion);
> + if (rec) {
> + rp->nmissed++;
> + goto out;
> + }
> raw_spin_lock_irqsave(&rp->lock, flags);
> if (!hlist_empty(&rp->free_instances)) {
> ri = hlist_entry(rp->free_instances.first,
> @@ -1964,7 +1959,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> raw_spin_lock_irqsave(&rp->lock, flags);
> hlist_add_head(&ri->hlist, &rp->free_instances);
> raw_spin_unlock_irqrestore(&rp->lock, flags);
> - return 0;
> + goto out;
> }
>
> arch_prepare_kretprobe(ri, regs);
> @@ -1978,6 +1973,8 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> rp->nmissed++;
> raw_spin_unlock_irqrestore(&rp->lock, flags);
> }
> +out:
> + atomic_dec(&rp->recursion);
> return 0;
> }
> NOKPROBE_SYMBOL(pre_handler_kretprobe);
>
--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>