Re: [PATCH v6] Fix INT1 Recursion with unregistered breakpoints

From: Jeff Merkey
Date: Mon Jan 11 2016 - 12:29:58 EST

Next message: Peter Zijlstra: "Re: [RFC] perf record: missing buildid for callstack modules"
Previous message: Ewan D. Milne: "[PATCH 2/2] scsi_proc: Change /proc/scsi/scsi to use bus device iterator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 12/19/15, Jeff Merkey <linux.mdb@xxxxxxxxx> wrote:
> Please consider the attached patch.
>
> SUMMARY
>
> This patch corrects a hard lockup failure of the system kernel if the
> operating system receives a breakpoint exception at a code execution
> address which was not registered with the operating system. The patch
> allows kernel debuggers, application profiling and performance modules,
> and external debugging tools to work better together at sharing the
> breakpoint registers on the platform in a way that they do not cause
> errors and system faults, and enables the full feature set in the
> breakpoint API. If a kernel application triggers a breakpoint
> or programs one in error, this patch will catch the condition and report
> it to the system log without the operating system experiencing a system
> fault. There are several consumers of the Linux Breakpoint API and all
> of them can and sometimes do cause the condition this patch corrects.
>
> CONDITIONS WHICH RESULT IN THIS SYSTEM FAULT
>
> This system fault can be caused from several sources. Any kernel code
> can access the debug registers and trigger a breakpoint directly by
> writing to these registers and trigger a hard system hang if no
> breakpoint was registered via arch_install_hw_breakpoint().
>
> kgdb/kdb and the perf event system both present garbage status in dr6
> then subsequently write this status into the thread.debugreg6 variable,
> then in some cases call hw_breakpoint_restore() which writes this
> status back into the dr6 hardware register.
>
> arch/x86/kernel/kgdb.c
> static void kgdb_hw_overflow_handler(struct perf_event *event,
> struct perf_sample_data *data, struct pt_regs *regs)
> {
> struct task_struct *tsk = current;
> int i;
>
> for (i = 0; i < 4; i++)
> if (breakinfo[i].enabled)
> tsk->thread.debugreg6 |= (DR_TRAP0 << i);
> }
>
> arch/x86/kernel/kgdb.c
> static void kgdb_correct_hw_break(void)
> {
> ... snip ...
>
> if (!dbg_is_early)
> hw_breakpoint_restore();
>
> ... snip ...
> }
>
> arch/x86/kernel/hw_breakpoint.c
> void hw_breakpoint_restore(void)
> {
> set_debugreg(__this_cpu_read(cpu_debugreg[0]), 0);
> set_debugreg(__this_cpu_read(cpu_debugreg[1]), 1);
> set_debugreg(__this_cpu_read(cpu_debugreg[2]), 2);
> set_debugreg(__this_cpu_read(cpu_debugreg[3]), 3);
> set_debugreg(current->thread.debugreg6, 6);
> set_debugreg(__this_cpu_read(cpu_dr7), 7);
> }
>
> The hardware only altars those bits that change, the rest of the altered
> dr6 value remains in the register.
>
> Upon the next int1 exception, dr6 presents this manufactured status to
> the int1 handler in hw_breakpoint.c which calls the non-existent
> breakpoint exceptions and any handlers which may have validly
> registered, creating phantom events. If other subsystems which call
> the perf handlers also have breakpoints registered, this
> manufactured status causes erroneous events to be signaled to the layers
> above.
>
> arch/x86/kernel/hw_breakpoint.c
> static int hw_breakpoint_handler(struct die_args *args)
> {
> ... snip ...
>
> /* Handle all the breakpoints that were triggered */
> for (i = 0; i < HBP_NUM; ++i) {
> if (likely(!(dr6 & (DR_TRAP0 << i))))
> continue;
>
> ... snip ...
>
> perf_bp_event(bp, args->regs);
>
> ... snip ...
> }
>
> After a few iterations of this cycling through the system, the
> thread.debugreg6 variable starts to resemble a random number generator
> as far as to which breakpoint just occurred.
>
> The perf handlers cause a different incarnation of this problem and
> create the situation by triggering a stale breakpoint set in dr7 for
> which the perf bp is NULL (not registered) or late and for which there
> is a single code path that fails to set the resume flag and clear the
> int1 exception status.
>
> TESTING AND REVIEW PERFORMED
>
> I have reviewed all the code that touches this patch and have
> determined it will function and support all of the software that
> depends on this handler properly. I have compiled and tested this
> patch with a test harness that tests the robustness of the linux
> breakpoint API and handlers in the following ways:
>
> 1. Setting multiple conditional breakpoints through
> arch_install_hw_breakpoint API across four processors to test the rate
> at which the interface can handle breakpoint exceptions
>
> 2. Setting unregistered breakpoints to test the handlers robustness
> in dealing with error handling conditions and errant or spurious
> hardware conditions and to simulate actual "lazy debug register
> switching" with null bp handlers to test the
> robustness of the handlers.
>
> 3. Clearing and setting breakpoints across multiple processors then
> triggering concurrent exceptions in both interrupt and process
> contexts.
>
> This patch improves robustness in several ways in the linux kernel:
>
> 1. Corrects bug in handling unregistered breakpoints.
>
> 2. Provides hardware check of dr7 to determine source of breakpoint
> if OS cannot ascertain the int1 source from its own state and
> variables.
>
> 3. Enable "lazy debug register switching" to function
> correctly.
>
> Signed-off-by: Jeff Merkey <linux.mdb@xxxxxxxxx>
> ---
> arch/x86/include/uapi/asm/debugreg.h | 1 +
> arch/x86/kernel/hw_breakpoint.c | 25 +++++++++++++++++++++++--
> 2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/debugreg.h
> b/arch/x86/include/uapi/asm/debugreg.h
> index 3c0874d..78fc83c 100644
> --- a/arch/x86/include/uapi/asm/debugreg.h
> +++ b/arch/x86/include/uapi/asm/debugreg.h
> @@ -38,6 +38,7 @@
> #define DR_RW_EXECUTE (0x0) /* Settings for the access types to trap on
> */
> #define DR_RW_WRITE (0x1)
> #define DR_RW_READ (0x3)
> +#define DR_RW_MASK (0x3) /* mask for breakpoint type field */
>
> #define DR_LEN_1 (0x0) /* Settings for data length to trap on */
> #define DR_LEN_2 (0x4)
> diff --git a/arch/x86/kernel/hw_breakpoint.c
> b/arch/x86/kernel/hw_breakpoint.c
> index 50a3fad..d199834 100644
> --- a/arch/x86/kernel/hw_breakpoint.c
> +++ b/arch/x86/kernel/hw_breakpoint.c
> @@ -444,7 +444,7 @@ EXPORT_SYMBOL_GPL(hw_breakpoint_restore);
> static int hw_breakpoint_handler(struct die_args *args)
> {
> int i, cpu, rc = NOTIFY_STOP;
> - struct perf_event *bp;
> + struct perf_event *bp = NULL;
> unsigned long dr7, dr6;
> unsigned long *dr6_p;
>
> @@ -477,6 +477,14 @@ static int hw_breakpoint_handler(struct die_args
> *args)
> continue;
>
> /*
> + * Check if we got an execute breakpoint, if so
> + * set the resume flag to avoid int1 recursion.
> + */
> + if (((dr7 >> ((i * DR_CONTROL_SIZE) + DR_CONTROL_SHIFT))
> + & DR_RW_MASK) == DR_RW_EXECUTE)
> + args->regs->flags |= X86_EFLAGS_RF;
> +
> + /*
> * The counter may be concurrently released but that can only
> * occur from a call_rcu() path. We can then safely fetch
> * the breakpoint, use its callback, touch its counter
> @@ -503,7 +511,8 @@ static int hw_breakpoint_handler(struct die_args *args)
>
> /*
> * Set up resume flag to avoid breakpoint recursion when
> - * returning back to origin.
> + * returning back to origin. perf_bp_event may
> + * change the flags so check twice.
> */
> if (bp->hw.info.type == X86_BREAKPOINT_EXECUTE)
> args->regs->flags |= X86_EFLAGS_RF;
> @@ -519,6 +528,18 @@ static int hw_breakpoint_handler(struct die_args
> *args)
> (dr6 & (~DR_TRAP_BITS)))
> rc = NOTIFY_DONE;
>
> + /*
> + * if we are about to signal to
> + * do_debug() to stop further processing
> + * and we have not ascertained the source
> + * of the breakpoint, log it as spurious.
> + */
> + if (rc == NOTIFY_STOP && !bp) {
> + printk_ratelimited(KERN_INFO
> + "INFO: spurious INT1 exception dr6: 0x%lX dr7: 0x%lX\n",
> + dr6, dr7);
> + }
> +
> set_debugreg(dr7, 7);
> put_cpu();
>
> --
> 1.8.3.1
>
>

I have not seen this patch turn up in linux-next or the git pull
fixes. So are you going to put it in or not? If not show me the
courtesy of letting me know if I have to maintain this fix locally to
get around this bug.

Thanks

Jeff

Next message: Peter Zijlstra: "Re: [RFC] perf record: missing buildid for callstack modules"
Previous message: Ewan D. Milne: "[PATCH 2/2] scsi_proc: Change /proc/scsi/scsi to use bus device iterator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]