Re: [PATCH 0/7] wchan: Fix wchan support

From: Josh Poimboeuf
Date: Thu Oct 14 2021 - 15:52:24 EST


On Thu, Oct 14, 2021 at 02:38:19PM +0100, Russell King (Oracle) wrote:
> What is going on here is that the ARM stacktrace code refuses to trace
> non-current tasks in a SMP environment due to the racy nature of doing
> so if the non-current tasks are running.
>
> When walking the stack with frame pointers, we:
>
> - validate that the frame pointer is between the stack pointer and the
> top of stack defined by that stack pointer.
> - we then load the next stack pointer and next frame pointer from the
> stack.
>
> The reason this is unsafe when the task is not blocked is the stack can
> change at any moment, which can cause the value read as a stack pointer
> to be wildly different. If the read frame pointer value is roughly in
> agreement, we can end up reading any part of memory, which would be an
> information leak.

It would be a good idea to add some guardrails to prevent that
regardless. If there's stack corruption for any reason, the unwinder
shouldn't make things worse.

On x86 the unwinder relies on the caller to ensure the task is blocked
(or current). If the caller doesn't do that, they might get garbage,
and they get to keep the pieces.

But an important part of that is that the unwinder has guardrails to
ensure it handles stack corruption gracefully by never accessing out of
bounds of the stack.

When multiple stacks are involved in a kernel execution path (task, irq,
exception, etc), the stacks link to each other (e.g., last word on the
irq stack might point to the task stack). Also the irq/exception stack
addresses are stored in percpu variables, and the task stack is in the
task struct. So the unwinder can easily make sure it's in-bounds. See
get_stack_info() in arch/x86/kernel/dumpstack_64.c.

--
Josh