Re: [PATCH 8/9] RISC-V: User-facing API

From: James Hogan
Date: Wed Jun 28 2017 - 18:42:49 EST


Hi Palmer,

On Wed, Jun 28, 2017 at 11:55:37AM -0700, Palmer Dabbelt wrote:
> diff --git a/arch/riscv/include/asm/syscalls.h b/arch/riscv/include/asm/syscalls.h
> new file mode 100644
> index 000000000000..d85267c4f7ea
> --- /dev/null
> +++ b/arch/riscv/include/asm/syscalls.h
> @@ -0,0 +1,25 @@
...
> +/* kernel/sys_riscv.c */
> +asmlinkage long sys_sysriscv(unsigned long, unsigned long,
> + unsigned long, unsigned long);

You suggested in the cover letter this wasn't muxed any longer, maybe
you should have a prototype for each of the cmpxchg syscalls instead?

> diff --git a/arch/riscv/include/uapi/asm/ptrace.h b/arch/riscv/include/uapi/asm/ptrace.h
> new file mode 100644
> index 000000000000..01aee1654eae
> --- /dev/null
> +++ b/arch/riscv/include/uapi/asm/ptrace.h
...
> +struct __riscv_f_ext_state {
> + __u32 f[32];
> + __u32 fcsr;
> +};
> +
> +struct __riscv_d_ext_state {
> + __u64 f[32];
> + __u32 fcsr;
> +};
> +
> +struct __riscv_q_ext_state {
> + __u64 f[64] __attribute__((aligned(16)));
> + __u32 fcsr;
> + /* Reserved for expansion of sigcontext structure. Currently zeroed
> + * upon signal, and must be zero upon sigreturn. */
> + __u32 reserved[3];
> +};
> +
> +union __riscv_fp_state {
> + struct __riscv_f_ext_state f;
> + struct __riscv_d_ext_state d;
> + struct __riscv_q_ext_state q;
> +};

Out of interest, how does one tell which fp format is in use?

> diff --git a/arch/riscv/include/uapi/asm/ucontext.h b/arch/riscv/include/uapi/asm/ucontext.h
> new file mode 100644
> index 000000000000..52eff9febcfd
> --- /dev/null
> +++ b/arch/riscv/include/uapi/asm/ucontext.h
...
> +struct ucontext {
> + unsigned long uc_flags;
> + struct ucontext *uc_link;
> + stack_t uc_stack;
> + sigset_t uc_sigmask;
> + /* glibc uses a 1024-bit sigset_t */
> + __u8 __unused[1024 / 8 - sizeof(sigset_t)];
> + /* last for future expansion */
> + struct sigcontext uc_mcontext;
> +};

Any particular reason not to use the asm-generic ucontext?

> diff --git a/arch/riscv/include/uapi/asm/unistd.h b/arch/riscv/include/uapi/asm/unistd.h
> new file mode 100644
> index 000000000000..7e3909ac3c18
> --- /dev/null
> +++ b/arch/riscv/include/uapi/asm/unistd.h
...
> +/* FIXME: This exists for now in order to maintain compatibility with our
> + * pre-upstream glibc, and will be removed for our real Linux submission.
> + */
> +#define __ARCH_WANT_RENAMEAT
> +

Don't forget ;-)

Have you seen the patches floating around for dropping
getrlimit/setrlimit (in favour of prlimit64) and fstatat64/fstat64 (in
favour of statx)? I guess its no big deal.

> +#include <asm-generic/unistd.h>
> +
> +/*
> + * These system calls add support for AMOs on RISC-V systems without support
> + * for the A extension.
> + */
> +#define __NR_sysriscv_cmpxchg32 (__NR_arch_specific_syscall + 0)
> +#define __NR_sysriscv_cmpxchg64 (__NR_arch_specific_syscall + 1)

I think you need the magic __SYSCALL invocations here like in
include/uapi/asm/unistd.h, otherwise they won't get included in your
syscall table.

> diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
> new file mode 100644
> index 000000000000..69b3b2d10664
> --- /dev/null
> +++ b/arch/riscv/kernel/ptrace.c
...
> +enum riscv_regset {
> + REGSET_X,
> +};
> +
> +/*
> + * Get registers from task and ready the result for userspace.
> + */
> +static char *getregs(struct task_struct *child, struct pt_regs *uregs)
> +{
> + *uregs = *task_pt_regs(child);
> + return (char *)uregs;
> +}
> +
> +/* Put registers back to task. */
> +static void putregs(struct task_struct *child, struct pt_regs *uregs)
> +{
> + struct pt_regs *regs = task_pt_regs(child);
> + *regs = *uregs;
> +}
> +
> +static int riscv_gpr_get(struct task_struct *target,
> + const struct user_regset *regset,
> + unsigned int pos, unsigned int count,
> + void *kbuf, void __user *ubuf)
> +{
> + struct pt_regs regs;
> +
> + getregs(target, &regs);
> +
> + return user_regset_copyout(&pos, &count, &kbuf, &ubuf, &regs, 0,
> + sizeof(regs));

Shouldn't this be limited to sizeof(struct user_regs_struct)?

Why not copy straight out of task_pt_regs(target) instead of bouncing
via the stack?

> +}
> +
> +static int riscv_gpr_set(struct task_struct *target,
> + const struct user_regset *regset,
> + unsigned int pos, unsigned int count,
> + const void *kbuf, const void __user *ubuf)
> +{
> + int ret;
> + struct pt_regs regs;
> +
> + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &regs, 0,
> + sizeof(regs));

likewise.

In fact if userland supplies insufficient data then this looks
vulnerable to a kernel stack data leak, since regs will remain partially
uninitialised and then get written to the target regs where it can be
read back again.

If you're going to bounce via the stack I think you need to fully
initialise before using user_regset_copyin, or you could just copy
directly into task_pt_regs(target) for now since, at least for the
current internal struct pt_regs, the begining of pt_regs appears to
match user_regs_struct.

> + if (ret)
> + return ret;
> +
> + putregs(target, &regs);

Similarly this needs to be careful not to overwrite the supervisor
registers with whatever was on kernel stack (assuming only partially
copied as suggested above)?

> +
> + return 0;
> +}
> +
> +
> +static const struct user_regset riscv_user_regset[] = {
> + [REGSET_X] = {
> + .core_note_type = NT_PRSTATUS,
> + .n = ELF_NGREG,
> + .size = sizeof(elf_greg_t),
> + .align = sizeof(elf_greg_t),
> + .get = &riscv_gpr_get,
> + .set = &riscv_gpr_set,
> + },

Will the FP registers get exposed at some point as well?

> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> new file mode 100644
> index 000000000000..ab699efe636e
> --- /dev/null
> +++ b/arch/riscv/kernel/sys_riscv.c
...
> +SYSCALL_DEFINE3(sysriscv_cmpxchg32, unsigned long, arg1, unsigned long, arg2,
> + unsigned long, arg3)
> +{
> + unsigned long flags;
> + unsigned long prev;

should that be unsigned int? Else on 64-bit half of it could be left
uninitialised.

> + unsigned int *ptr;

should that be tagged with __user?

> + unsigned int err;
> +
> + ptr = (unsigned int *)arg1;

I presume you'll need to cast to __user __force to keep sparse happy
here.

> + if (!access_ok(VERIFY_WRITE, ptr, sizeof(unsigned int)))
> + return -EFAULT;
> +
> + preempt_disable();
> + raw_local_irq_save(flags);
> + err = __get_user(prev, ptr);
> + if (likely(!err && prev == arg2))
> + err = __put_user(arg3, ptr);
> + raw_local_irq_restore(flags);
> + preempt_enable();

Are user accesses safe from atomic context? What if it needs paging in?

You could disable page faults but then I think you'd have to handle the
EFAULT again outside of atomic context to try getting it paged in, and
then retry in atomic context. Or perhaps there's a cleaner way that
doesn't come to mind late at night.

I'm not sure OTOH whether copy on write (i.e. affecting the __put_user()
but not the __get_user() would be problematic. I suppose as long as it
can safely allocate a page it should be fine... Should be possible to
test using madvise(MADV_DONTNEED) (which I think makes pages use the
zero page with copy-on-write).

Also if this is going to be included on SMP kernels (where I gather
proper atomics are available), does it need an SMP safe version too
which uses proper atomics?

> +
> + return unlikely(err) ? err : prev;
> +}
> +
> +SYSCALL_DEFINE3(sysriscv_cmpxchg64, unsigned long, arg1, unsigned long, arg2,
> + unsigned long, arg3)
> +{
> + unsigned long flags;
> + unsigned long prev;
> + unsigned int *ptr;

should that be unsigned long __user *?

> + unsigned int err;
> +
> + ptr = (unsigned int *)arg1;
> + if (!access_ok(VERIFY_WRITE, ptr, sizeof(unsigned long)))
> + return -EFAULT;
> +
> + preempt_disable();
> + raw_local_irq_save(flags);
> + err = __get_user(prev, ptr);
> + if (likely(!err && prev == arg2))
> + err = __put_user(arg3, ptr);
> + raw_local_irq_restore(flags);
> + preempt_enable();

Likewise to other comments above.

This doesn't look much different to sysriscv_cmpxchg32 on 32-bit. Is it
meant to be excluded from 32-bit kernels? If so definition of the __NR_
constant and the __SYSCALL magic in uapi/asm/unistd.h should I presume
be conditional on the ABI.

> +
> + return unlikely(err) ? err : prev;
> +}

Cheers
James

Attachment: signature.asc
Description: Digital signature