Re: [PATCH] x86: Pin task-stack in __get_wchan()

From: Qi Zheng
Date: Fri Nov 19 2021 - 05:03:07 EST

On 11/19/21 5:29 PM, Peter Zijlstra wrote:
On Thu, Nov 18, 2021 at 06:04:27PM -0800, Josh Poimboeuf wrote:
On Thu, Nov 18, 2021 at 01:11:09PM +0100, Peter Zijlstra wrote:

I now have the below, the only thing missing is that there's a
user_mode() call on a stack based regs. Now on x86_64 we can
__get_kernel_nofault() regs->cs and call it a day, but on i386 we have
to also fetch regs->flags.

Is this really the way to go?

Please no. Can we just add a check in unwind_start() to ensure the
caller did try_get_task_stack()?

I tried; but at best it's fundamentally racy and in practise its worse
because init_task doesn't seem to believe in refcounts and kthreads are
odd for some raisin. Now those are fixable, but given the fundamental
races, I don't see how it's ever going to be reliable.

I don't mind the __get_kernel_nofault() usage and think I can do a
better implementation that will allow us to get rid of the
pagefault_{dis,en}able() sprinkling, but that's for another day. It's
just the user_mode(regs) usage that's going to be somewhat ugleh.

Anyway, below is the minimal fix for the situation at hand. I'm not
going to be around much today, so if Linus wants to pick that up instead
of mass revert things that's obviously fine too.

Subject: x86: Pin task-stack in __get_wchan()

When commit 5d1ceb3969b6 ("x86: Fix __get_wchan() for !STACKTRACE")
moved from stacktrace to native unwind_*() usage, the
try_get_task_stack() got lost, leading to use-after-free issues for
dying tasks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
arch/x86/kernel/process.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index e9ee8b526319..04143a653a8a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -964,6 +964,9 @@ unsigned long __get_wchan(struct task_struct *p)
struct unwind_state state;
unsigned long addr = 0;
+ if (!try_get_task_stack(p))
+ return 0;
for (unwind_start(&state, p, NULL, NULL); !unwind_done(&state);
unwind_next_frame(&state)) {
addr = unwind_get_return_address(&state);
@@ -974,6 +977,8 @@ unsigned long __get_wchan(struct task_struct *p)
+ put_task_stack(p);
return addr;

This implementation is very similar to stack_trace_save_tsk(), maybe we
can just move stack_trace_save_tsk() out of CONFIG_STACKTRACE and reuse