Re: [PATCH v10 1/4] random: add vgetrandom_alloc() syscall

From: Florian Weimer
Date: Wed Nov 30 2022 - 05:53:16 EST


* Jason A. Donenfeld:

> +#ifdef CONFIG_VGETRANDOM_ALLOC_SYSCALL
> +/**
> + * vgetrandom_alloc - allocate opaque states for use with vDSO getrandom().
> + *
> + * @num: on input, a pointer to a suggested hint of how many states to
> + * allocate, and on output the number of states actually allocated.

Should userspace call this system call again if it needs more states?
The interface description doesn't make this clear.

> + * @size_per_each: the size of each state allocated, so that the caller can
> + * split up the returned allocation into individual states.
> + *
> + * @flags: currently always zero.
> + *
> + * The getrandom() vDSO function in userspace requires an opaque state, which
> + * this function allocates by mapping a certain number of special pages into
> + * the calling process. It takes a hint as to the number of opaque states
> + * desired, and provides the caller with the number of opaque states actually
> + * allocated, the size of each one in bytes, and the address of the first
> + * state.
> +
> + * Returns a pointer to the first state in the allocation.
> + *
> + */

How do we deallocate this memory? Must it remain permanently allocated?

Can userspace use the memory for something else if it's not passed to
getrandom? The separate system call strongly suggests that the
allocation is completely owned by the kernel, but there isn't
documentation here how the allocation life-cycle is supposed to look
like. In particular, it is not clear if vgetrandom_alloc or getrandom
could retain a reference to the allocation in a future implementation of
these interfaces.

Some users might want to zap the memory for extra hardening after use,
and it's not clear if that's allowed, either.

> +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned int __user *, num,
> + unsigned int __user *, size_per_each, unsigned int, flags)
> +{

ABI-wise, that should work.

> + const size_t state_size = sizeof(struct vgetrandom_state);
> + size_t alloc_size, num_states;
> + unsigned long pages_addr;
> + unsigned int num_hint;
> + int ret;
> +
> + if (flags)
> + return -EINVAL;
> +
> + if (get_user(num_hint, num))
> + return -EFAULT;
> +
> + num_states = clamp_t(size_t, num_hint, 1, (SIZE_MAX & PAGE_MASK) / state_size);
> + alloc_size = PAGE_ALIGN(num_states * state_size);

Doesn't this waste space for one state if state_size happens to be a
power of 2? Why do this SIZE_MAX & PAGE_MASK thing at all? Shouldn't
it be PAGE_SIZE / state_size?

> + if (put_user(alloc_size / state_size, num) || put_user(state_size, size_per_each))
> + return -EFAULT;
> +
> + pages_addr = vm_mmap(NULL, 0, alloc_size, PROT_READ | PROT_WRITE,
> + MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, 0);

I think Rasmus has already raised questions about MAP_LOCKED.

I think the kernel cannot rely on it because userspace could call
munlock on the allocation.

> + if (IS_ERR_VALUE(pages_addr))
> + return pages_addr;
> +
> + ret = do_madvise(current->mm, pages_addr, alloc_size, MADV_WIPEONFORK);
> + if (ret < 0)
> + goto err_unmap;
> +
> + return pages_addr;
> +
> +err_unmap:
> + vm_munmap(pages_addr, alloc_size);
> + return ret;
> +}
> +#endif

If there's no registration of the allocation, it's not clear why we need
a separate system call for this. From a documentation perspective, it
may be easier to describe proper use of the getrandom vDSO call if
ownership resides with userspace. But it will constrain future
evolution of the implementation because you can't add registration
(retaining a reference to the passed-in area in getrandom) after the
fact. But I'm not sure if this is possible with the current interface,
either. Userspace has to make some assumptions about the life-cycle to
avoid a memory leak on thread exit.

Thanks,
Florian