Re: [PATCH -tip v3 04/10] x86/kprobes: Prohibit probing on IRQ handlers directly

From: Masami Hiramatsu
Date: Tue Mar 26 2019 - 10:51:04 EST


On Mon, 25 Mar 2019 17:23:34 -0400
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Wed, 13 Feb 2019 01:12:44 +0900
> Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> > Prohibit probing on IRQ handlers in irqentry_text because
> > if it interrupts user mode, at that point we haven't changed
> > to kernel space yet and which eventually leads a double fault.
> > E.g.
> >
> > # echo p apic_timer_interrupt > kprobe_events
>
> Hmm, this breaks one of my tests (which I probe on do_IRQ).

OK, it seems this patch is a bit redundant, because
I found that these interrupt handler issue has been fixed
by Andrea's commit before merge this patch.

commit a50480cb6d61d5c5fc13308479407b628b6bc1c5
Author: Andrea Righi <righi.andrea@xxxxxxxxx>
Date: Thu Dec 6 10:56:48 2018 +0100

kprobes/x86: Blacklist non-attachable interrupt functions

These interrupt functions are already non-attachable by kprobes.
Blacklist them explicitly so that they can show up in
/sys/kernel/debug/kprobes/blacklist and tools like BCC can use this
additional information.

This description is a bit odd (maybe his patch is after mine?) I think
while updating this series, the patches were merged out of order.
Anyway, with above patch, the core problematic probe points are blacklisted.

>
> It's been working for years.
>
>
> > # echo 1 > events/kprobes/enable
> > PANIC: double fault, error_code: 0x0
> > CPU: 1 PID: 814 Comm: less Not tainted 4.20.0-rc3+ #30
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
> > RIP: 0010:error_entry+0x12/0xf0
> > [snip]
> > Call Trace:
> > <ENTRY_TRAMPOLINE>
> > ? native_iret+0x7/0x7
> > ? async_page_fault+0x8/0x30
> > ? trace_hardirqs_on_thunk+0x1c/0x1c
> > ? error_entry+0x7c/0xf0
> > ? async_page_fault+0x8/0x30
> > ? native_iret+0x7/0x7
> > ? int3+0xa/0x20
> > ? trace_hardirqs_on_thunk+0x1c/0x1c
> > ? error_entry+0x7c/0xf0
> > ? int3+0xa/0x20
> > ? apic_timer_interrupt+0x1/0x20
> > </ENTRY_TRAMPOLINE>
> > Kernel panic - not syncing: Machine halted.
> > Kernel Offset: disabled
>
> I'm not able to reproduce this (by removing this commit).

I ensured that if I revert both of this patch and Andrea's patch,
I can reproduce this with probing on apic_timer_interrupt().

> I'm thinking something else may have changed, as I've been tracing
> interrupt entries for years, and interrupting userspace while doing
> this.
>
> I've even added probes where ftrace isn't (where it uses an int3) and
> still haven't hit a problem.
>
> I think this patch is swatting a symptom of a bug and not addressing
> the bug itself. Can you send me the config that triggers this?

Yes, it seems you're right. Andrea's commit specifically fixed the
issue and mine is redundant. (I'm not sure why do_IRQ is in
__irqentry_text...)

So, Ingo, please revert this, since this bug already has been fixed by
commit a50480cb6d61 ("kprobes: x86_64: blacklist non-attachable interrupt
functions")

BTW, for further error investigation, I attached my kconfig which is
usually I'm testing (some options can be changed) on Qemu.
I'm using my mini-container shellscript ( https://github.com/mhiramat/mincs
) which supports qemu-container.


Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>

Attachment: .config
Description: Binary data