Re: rewind_stack_do_exit + KASAN incompatibility

From: Andy Lutomirski
Date: Thu Aug 23 2018 - 17:23:07 EST


On Thu, Aug 23, 2018 at 1:59 PM, Jann Horn <jannh@xxxxxxxxxx> wrote:
> Some while back (commit 2deb4be28077 ("x86/dumpstack: When OOPSing,
> rewind the stack before do_exit()")), Andy added
> rewind_stack_do_exit(), which is used in kernel oops handling to
> discard the current stack contents and reset the stack pointer,
> ensuring that the whole kernel stack is available for do_exit().
> However, this code isn't integrated with KASAN.
>
> Sometimes, when ASAN enters a function, it poisons parts of the
> newly-allocated stack frame; on function exit, it un-poisons that
> memory. ASAN does not, in general, unpoison stack memory on function
> entry; instead, it is assumed that unallocated stack memory is not
> poisoned.
>
> This means that after rewind_stack_do_exit() has rewound the stack,
> random parts of the stack are left poisoned, and when you try to
> access those, KASAN spews random false-positives. I'm currently
> working on adding some new kernel code, including an LKDTM testcase,
> and running that testcase generated the following spew - the first
> oops is intended, but the KASAN report after it is, from what I can
> tell, garbage.
>
> I'm not very familiar with KASAN internals, but I think a call to
> kasan_unpoison_task_stack(current) in the right place should solve the
> issue. I'm not entirely sure about where the call should be coming
> from - probably from inside rewind_stack_do_exit()? But I'm not sure
> whether it's possible to do this after rewinding the stack pointer
> (which would require kasan_unpoison_task_stack(), including all
> callees, to be uninstrumented), or whether it would have to happen
> before rewinding the stack pointer.

Looking at the KASAN code, it looks like the whole file full of those
helpers is uninstrumented, so you should be able to do it however you
like. The easiest way may be to copy this into rewind_stack_do_exit
from wakeup_64.S:

#ifdef CONFIG_KASAN
/*
* The suspend path may have poisoned some areas deeper in the stack,
* which we now need to unpoison.
*/
movq %rsp, %rdi
call kasan_unpoison_task_stack_below
#endif

except you'd change the comment.

--Andy