Re: [BUG] arm: kgdb: patch_text() in kgdb_arch_set_breakpoint() may sleep

From: Doug Anderson
Date: Mon Aug 24 2015 - 19:56:48 EST


Kees,

On Mon, Aug 24, 2015 at 10:46 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Sun, Aug 23, 2015 at 7:45 PM, Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>> On Wed, Aug 5, 2015 at 8:50 AM, Aapo Vienamo <avienamo@xxxxxxxxxx> wrote:
>>> Hi,
>>>
>>> The breakpoint setting code in arch/arm/kernel/kgdb.c calls
>>> patch_text(), which ends up trying to sleep while in interrupt context.
>>> The bug was introduced by commit: 23a4e40 arm: kgdb: Handle read-only
>>> text / modules. The resulting behavior is "BUG: scheduling while
>>> atomic..." when setting a breakpoint in kgdb. This was tested on an
>>> Nvidia Jetson TK1 board with 4.2.0-rc5-next-20150805 kernel.
>>>
>>> Regards,
>>> Aapo Vienamo
>>
>> Aapo,
>>
>> Including the stack trace with this would have been helpful, though
>> it's not too hard to reproduce. Here it is:
>>
>> [ 416.510559] BUG: scheduling while atomic: swapper/0/0/0x00010007
>> [ 416.516554] Modules linked in:
>> [ 416.519614] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
>> 4.2.0-rc7-00133-geb63b34 #1073
>> [ 416.527341] Hardware name: Rockchip (Device Tree)
>> [ 416.532042] [<c0017a4c>] (unwind_backtrace) from [<c00133d4>]
>> (show_stack+0x20/0x24)
>> [ 416.539772] [<c00133d4>] (show_stack) from [<c05400e8>]
>> (dump_stack+0x84/0xb8)
>> [ 416.546983] [<c05400e8>] (dump_stack) from [<c004913c>]
>> (__schedule_bug+0x54/0x6c)
>> [ 416.554540] [<c004913c>] (__schedule_bug) from [<c054065c>]
>> (__schedule+0x80/0x668)
>> [ 416.562183] [<c054065c>] (__schedule) from [<c0540cfc>] (schedule+0xb8/0xd4)
>> [ 416.569219] [<c0540cfc>] (schedule) from [<c0543a3c>]
>> (schedule_timeout+0x2c/0x234)
>> [ 416.576861] [<c0543a3c>] (schedule_timeout) from [<c05417c0>]
>> (wait_for_common+0xf4/0x188)
>> [ 416.585109] [<c05417c0>] (wait_for_common) from [<c0541874>]
>> (wait_for_completion+0x20/0x24)
>> [ 416.593531] [<c0541874>] (wait_for_completion) from [<c00a0104>]
>> (__stop_cpus+0x58/0x70)
>> [ 416.601608] [<c00a0104>] (__stop_cpus) from [<c00a0580>]
>> (stop_cpus+0x3c/0x54)
>> [ 416.608817] [<c00a0580>] (stop_cpus) from [<c00a06c4>]
>> (__stop_machine+0xcc/0xe8)
>> [ 416.616286] [<c00a06c4>] (__stop_machine) from [<c00a0714>]
>> (stop_machine+0x34/0x44)
>> [ 416.624016] [<c00a0714>] (stop_machine) from [<c00173e8>]
>> (patch_text+0x28/0x34)
>> [ 416.631399] [<c00173e8>] (patch_text) from [<c001733c>]
>> (kgdb_arch_set_breakpoint+0x40/0x4c)
>> [ 416.639823] [<c001733c>] (kgdb_arch_set_breakpoint) from
>> [<c00a0d68>] (kgdb_validate_break_address+0x2c/0x60)
>> [ 416.649719] [<c00a0d68>] (kgdb_validate_break_address) from
>> [<c00a0e90>] (dbg_set_sw_break+0x1c/0xdc)
>> [ 416.658922] [<c00a0e90>] (dbg_set_sw_break) from [<c00a2e88>]
>> (gdb_serial_stub+0x9c4/0xba4)
>> [ 416.667259] [<c00a2e88>] (gdb_serial_stub) from [<c00a11cc>]
>> (kgdb_cpu_enter+0x1f8/0x60c)
>> [ 416.675423] [<c00a11cc>] (kgdb_cpu_enter) from [<c00a18cc>]
>> (kgdb_handle_exception+0x19c/0x1d0)
>> [ 416.684106] [<c00a18cc>] (kgdb_handle_exception) from [<c0016f7c>]
>> (kgdb_compiled_brk_fn+0x30/0x3c)
>> [ 416.693135] [<c0016f7c>] (kgdb_compiled_brk_fn) from [<c00091a4>]
>> (do_undefinstr+0x1a4/0x20c)
>> [ 416.701643] [<c00091a4>] (do_undefinstr) from [<c001400c>]
>> (__und_svc_finish+0x0/0x34)
>> [ 416.709543] Exception stack(0xc07c1ce8 to 0xc07c1d30)
>> [ 416.714584] 1ce0: 00000000 c07c6504 c086e290
>> c086e294 c086e294 c086e290
>> [ 416.722745] 1d00: c07c6504 00000067 00000001 c07c2100 00000027
>> c07c1d4c c07c1d50 c07c1d30
>> [ 416.730905] 1d20: c00a0990 c00a08d0 60000193 ffffffff
>> [ 416.735947] [<c001400c>] (__und_svc_finish) from [<c00a08d0>]
>> (kgdb_breakpoint+0x58/0x94)
>> [ 416.744110] [<c00a08d0>] (kgdb_breakpoint) from [<c00a0990>]
>> (sysrq_handle_dbg+0x58/0x6c)
>> [ 416.752273] [<c00a0990>] (sysrq_handle_dbg) from [<c02c230c>]
>> (__handle_sysrq+0xac/0x15c)
>> [ 416.760437] [<c02c230c>] (__handle_sysrq) from [<c02c23ec>]
>> (handle_sysrq+0x30/0x34)
>>
>>
>> Kees: I think you've dealt with a lot more of these types of issues
>> than I have. Any quick thoughts? If not I can put it on my long-term
>> list of things to do, but until then we could always just post a
>> Revert...
>
> I don't think a revert is in order here. CONFIG_DEBUG_RODATA could be
> turned off for builds where you need kgdb while this bug gets found.

I don't think that's right. In my testing I don't have
CONFIG_DEBUG_RODATA. I think I did the right grep:

$ grep ARM_KERNMEM_PERMS .config
# CONFIG_ARM_KERNMEM_PERMS is not set

Basically no matter what we'll call:
- kgdb_arch_set_breakpoint
-- patch_text
--- stop_machine

...no dependencies on RODATA.

> I
> don't actually see where we've gone wrong, though. Looks like
> scheduling happened while waiting for CPUs to stop? Where did we enter
> atomic?

We're in kdb. That's a stop mode debugger. No sleeping allowed while
in the debugger.


> Perhaps we need to test if we're already atomic in patch_text, and
> only call stop_machine if we need to?
>
> Untested (and likely mangled by gmail):
>
> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
> index 69bda1a5707e..855696bfe072 100644
> --- a/arch/arm/kernel/patch.c
> +++ b/arch/arm/kernel/patch.c
> @@ -124,5 +124,8 @@ void __kprobes patch_text(void *addr, unsigned int insn)
> .insn = insn,
> };
>
> - stop_machine(patch_text_stop_machine, &patch, NULL);
> + if (unlikely(in_atomic_preempt_off()))
> + patch_text_stop_machine(&patch);
> + else
> + stop_machine(patch_text_stop_machine, &patch, NULL);

Ah, right. We're already stopped, so just not stopping again seems
reasonable. I think I'd rather just use in_dbg_master() as the test
since that's a case where I _know_ all the CPUs are stopped. Doesn't
in_atomic_preempt_off() just check if preemption is off for this
single CPU?

Anyway, I'll throw a patch up now. It fixes it for me. :)

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/