Re: [PATCH v4] riscv: fix race when vmap stack overflow
From: Guo Ren
Date: Wed Nov 30 2022 - 20:17:28 EST
On Thu, Dec 1, 2022 at 12:54 AM Palmer Dabbelt <palmer@xxxxxxxxxxxx> wrote:
>
> On Tue, 29 Nov 2022 23:15:40 PST (-0800), guoren@xxxxxxxxxx wrote:
> > The comment becomes better. Thx.
> >
> > On Wed, Nov 30, 2022 at 10:29 AM Palmer Dabbelt <palmer@xxxxxxxxxxxx> wrote:
> >>
> >> From: Jisheng Zhang <jszhang@xxxxxxxxxx>
> >>
> >> Currently, when detecting vmap stack overflow, riscv firstly switches
> >> to the so called shadow stack, then use this shadow stack to call the
> >> get_overflow_stack() to get the overflow stack. However, there's
> >> a race here if two or more harts use the same shadow stack at the same
> >> time.
> >>
> >> To solve this race, we introduce spin_shadow_stack atomic var, which
> >> will be swap between its own address and 0 in atomic way, when the
> >> var is set, it means the shadow_stack is being used; when the var
> >> is cleared, it means the shadow_stack isn't being used.
> >>
> >> Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> >> Signed-off-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
> >> Suggested-by: Guo Ren <guoren@xxxxxxxxxx>
> >> Reviewed-by: Guo Ren <guoren@xxxxxxxxxx>
> >> Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@xxxxxxxxxx
> >> [Palmer: Add AQ to the swap, and also some comments.]
> >> Signed-off-by: Palmer Dabbelt <palmer@xxxxxxxxxxxx>
> >> ---
> >> Sorry to just re-spin this one without any warning, but I'd read patch a
> >> few times and every time I'd managed to convice myself there was a much
> >> simpler way of doing this. By the time I'd figured out why that's not
> >> the case it seemed faster to just write the comments.
> >>
> >> I've stashed this, right on top of the offending commit, at
> >> palmer/riscv-fix_vmap_stack.
> >>
> >> Since v3:
> >> - Add AQ to the swap.
> >> - Add a bunch of comments.
> >>
> >> Since v2:
> >> - use REG_AMOSWAP
> >> - add comment to the purpose of smp_store_release()
> >>
> >> Since v1:
> >> - use smp_store_release directly
> >> - use unsigned int instead of atomic_t
> >> ---
> >> arch/riscv/include/asm/asm.h | 1 +
> >> arch/riscv/kernel/entry.S | 13 +++++++++++++
> >> arch/riscv/kernel/traps.c | 18 ++++++++++++++++++
> >> 3 files changed, 32 insertions(+)
> >>
> >> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> >> index 618d7c5af1a2..e15a1c9f1cf8 100644
> >> --- a/arch/riscv/include/asm/asm.h
> >> +++ b/arch/riscv/include/asm/asm.h
> >> @@ -23,6 +23,7 @@
> >> #define REG_L __REG_SEL(ld, lw)
> >> #define REG_S __REG_SEL(sd, sw)
> >> #define REG_SC __REG_SEL(sc.d, sc.w)
> >> +#define REG_AMOSWAP_AQ __REG_SEL(amoswap.d.aq, amoswap.w.aq)
> > Below is the reason why I use the relax version here:
> > https://lore.kernel.org/all/CAJF2gTRAEX_jQ_w5H05dyafZzHq+P5j05TJ=C+v+OL__GQam4A@xxxxxxxxxxxxxx/T/#u
>
> Sorry, I hadn't seen that one. Adding Andrea. IMO the acquire/release
> pair is necessary here, with just relaxed the stack stores inside the
> lock could show up on the next hart trying to use the stack.
Don't worry about relaxing amoswap, sp could give WAR & WAW
dependency. You could add acquire here, just for appearance.
>
> >> #define REG_ASM __REG_SEL(.dword, .word)
> >> #define SZREG __REG_SEL(8, 4)
> >> #define LGREG __REG_SEL(3, 2)
> >> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> >> index 98f502654edd..5fdb6ba09600 100644
> >> --- a/arch/riscv/kernel/entry.S
> >> +++ b/arch/riscv/kernel/entry.S
> >> @@ -387,6 +387,19 @@ handle_syscall_trace_exit:
> >>
> >> #ifdef CONFIG_VMAP_STACK
> >> handle_kernel_stack_overflow:
> >> + /*
> >> + * Takes the psuedo-spinlock for the shadow stack, in case multiple
> >> + * harts are concurrently overflowing their kernel stacks. We could
> >> + * store any value here, but since we're overflowing the kernel stack
> >> + * already we only have SP to use as a scratch register. So we just
> >> + * swap in the address of the spinlock, as that's definately non-zero.
> >> + *
> >> + * Pairs with a store_release in handle_bad_stack().
> >> + */
> >> +1: la sp, spin_shadow_stack
> >> + REG_AMOSWAP_AQ sp, sp, (sp)
> >> + bnez sp, 1b
> >> +
> >> la sp, shadow_stack
> >> addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
> >>
> >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> >> index bb6a450f0ecc..be54ccea8c47 100644
> >> --- a/arch/riscv/kernel/traps.c
> >> +++ b/arch/riscv/kernel/traps.c
> >> @@ -213,11 +213,29 @@ asmlinkage unsigned long get_overflow_stack(void)
> >> OVERFLOW_STACK_SIZE;
> >> }
> >>
> >> +/*
> >> + * A pseudo spinlock to protect the shadow stack from being used by multiple
> >> + * harts concurrently. This isn't a real spinlock because the lock side must
> >> + * be taken without a valid stack and only a single register, it's only taken
> >> + * while in the process of panicing anyway so the performance and error
> >> + * checking a proper spinlock gives us doesn't matter.
> >> + */
> >> +unsigned long spin_shadow_stack;
> >> +
> >> asmlinkage void handle_bad_stack(struct pt_regs *regs)
> >> {
> >> unsigned long tsk_stk = (unsigned long)current->stack;
> >> unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
> >>
> >> + /*
> >> + * We're done with the shadow stack by this point, as we're on the
> >> + * overflow stack. Tell any other concurrent overflowing harts that
> >> + * they can proceed with panicing by releasing the pseudo-spinlock.
> >> + *
> >> + * This pairs with an amoswap.aq in handle_kernel_stack_overflow.
> >> + */
> >> + smp_store_release(&spin_shadow_stack, 0);
> >> +
> >> console_verbose();
> >>
> >> pr_emerg("Insufficient stack space to handle exception!\n");
> >> --
> >> 2.38.1
> >>
--
Best Regards
Guo Ren