Re: [PATCH] x86/nmi/64: avoid passing user space rsp of pt_regs to nmi handler
From: Josh Poimboeuf
Date: Wed Aug 23 2017 - 11:21:06 EST
On Wed, Aug 23, 2017 at 10:19:02PM +0800, oliver yang wrote:
> 2017-08-23 1:51 GMT+08:00 Josh Poimboeuf <jpoimboe@xxxxxxxxxx>:
> > On Tue, Aug 22, 2017 at 05:09:19PM +0000, yang oliver wrote:
> >> From: Yong Yang <yangoliver@xxxxxxxxx>
> >>
> >> While NMI interrupts the very beginning of SYSCALL, the rsp could
> >> be still user space address without loading kernel address. Then
> >> the pt_regs constructed by the NMI entry point could have a user
> >> space rsp. If a NMI handler tried to dump stack by using this rsp,
> >> it can cause the kernel panic.
> >
> > To me this sounds like an unwinder bug. The unwinder is supposed to
> > have checks to prevent it from accessing user space.
>
> What kind of checking should we do here?
>
> While the NMI interrupts the first instruction of entry_SYSCALL_64,
> the rip is kernel address, but rsp is still a user space address.
>
> Usually, kernel uses the "user_mode" check to determine user space
> pt_regs, but this check doesn't work for this situation.
The unwinder tries to make sure that all its memory accesses are within
the bounds of the kernel stacks (task, irq, and exception stacks). See
update_stack_state() in a recent kernel:
/*
* If the next bp isn't on the current stack, switch to the next one.
*
* We may have to traverse multiple stacks to deal with the possibility
* that info->next_sp could point to an empty stack and the next bp
* could be on a subsequent stack.
*/
while (!on_stack(info, frame, len))
if (get_stack_info(info->next_sp, state->task, info,
&state->stack_mask))
return false;
> > I know you had
> > previously reported this for an older (pre-4.9) kernel. Have you
> > verified the bug still exists on a recent kernel?
>
> Yes, I ran into kernel panic on 3.10 kernel, and the root case is
> show_stack_log_lvl in nmi handler dump the user space stack,
> but that memory got swapped out or freed.
>
>
> On latest kernel, show_stack_log_lvl never dump stack contents.
> But I'm not sure other NMI handler, or kernel trace code could dump
> a user space stack dump while the kernel mode pt_regs which has
> a user space sp.
>
> I wrote a module which register a NMI handler, it did the check like
> below,
>
> if (!user_mode(regs)) { /* kernel mode, but could contain user address */
> rsp = regs->sp;
>
> if (rsp < __PAGE_OFFSET) {
> pr_err("User Stack Pointer: %016lx", rsp);
> pr_err("User Stack Dump: %016lx", *((unsigned long *)rsp));
> }
> }
>
> I manage to make NMI interrupted the entry_SYSCALL_64 boundary,
> then I could trigger the issue.
>
> Not sure whether it is good enough.
>
> Let me know if you have other suggestions.
The frame pointer unwinder was completely rewritten in 4.9, and
show_stack_log_lvl() was removed. AFAIK, it shouldn't panic when
unwinding from an NMI in early syscall code. If you can recreate the
problem on a recent kernel, please let me know.
--
Josh