Re: [PATCH v12 10/28] riscv/mm: Implement map_shadow_stack() syscall

From: Radim Krčmář
Date: Thu Apr 10 2025 - 06:01:43 EST


2025-03-14T14:39:29-07:00, Deepak Gupta <debug@xxxxxxxxxxxx>:
> As discussed extensively in the changelog for the addition of this
> syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
> existing mmap() and madvise() syscalls do not map entirely well onto the
> security requirements for shadow stack memory since they lead to windows
> where memory is allocated but not yet protected or stacks which are not
> properly and safely initialised. Instead a new syscall map_shadow_stack()
> has been defined which allocates and initialises a shadow stack page.
>
> This patch implements this syscall for riscv. riscv doesn't require token
> to be setup by kernel because user mode can do that by itself. However to
> provide compatibility and portability with other architectues, user mode
> can specify token set flag.

RISC-V shadow stack could use mmap() and madvise() perfectly well.
Userspace can always initialize the shadow stack properly and the shadow
stack memory is never protected from other malicious threads.

I think that the compatibility argument is reasonable. We'd need to
modify the other syscalls to allow a write-only mapping anyway.

> diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
> +static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned long val)
> +{
> + /*
> + * Never expect -1 on shadow stack. Expect return addresses and zero
> + */
> + unsigned long swap = -1;
> + __enable_user_access();
> + asm goto(
> + ".option push\n"
> + ".option arch, +zicfiss\n"

Shouldn't compiler accept ssamoswap.d opcode even without zicfiss arch?

> + "1: ssamoswap.d %[swap], %[val], %[addr]\n"
> + _ASM_EXTABLE(1b, %l[fault])
> + RISCV_ACQUIRE_BARRIER

Why is the barrier here?

> + ".option pop\n"
> + : [swap] "=r" (swap), [addr] "+A" (*addr)
> + : [val] "r" (val)
> + : "memory"
> + : fault
> + );
> + __disable_user_access();
> + return swap;
> +fault:
> + __disable_user_access();
> + return -1;

I think we should return 0 and -EFAULT.
We can ignore the swapped value, or return it through a pointer.

> +}
> +
> +static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size,
> + unsigned long token_offset, bool set_tok)
> +{
> + int flags = MAP_ANONYMOUS | MAP_PRIVATE;

Is MAP_GROWSDOWN pointless?

> + struct mm_struct *mm = current->mm;
> + unsigned long populate, tok_loc = 0;
> +
> + if (addr)
> + flags |= MAP_FIXED_NOREPLACE;
> +
> + mmap_write_lock(mm);
> + addr = do_mmap(NULL, addr, size, PROT_READ, flags,

PROT_READ implies VM_READ, so won't this select PAGE_COPY in the
protection_map instead of PAGE_SHADOWSTACK?

Wouldn't avoiding VM_READ also allow us to get rid of the ugly hack in
pte_mkwrite? (VM_WRITE would naturally select the right XWR flags.)

> + VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL);
> + mmap_write_unlock(mm);
> +
> +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
> +{
> [...]
> + if (addr && (addr & (PAGE_SIZE - 1)))

if (!PAGE_ALIGNED(addr))