Re: [PATCH v10 1/4] random: add vgetrandom_alloc() syscall
From: Florian Weimer
Date: Fri Dec 02 2022 - 12:19:04 EST
* Jason A. Donenfeld:
> I don't think zapping that memory is supported, or even a sensible thing
> to do. In the first place, I don't think we should suggest that the user
> can dereference that pointer, at all. In that sense, maybe it's best to
> call it a "handle" or something similar (a "HANDLE"! a "HWND"? a "HRNG"?
Surely the caller has to carve up the allocation, so the returned
pointer is not opaque at all. From Adhemerval's glibc patch:
grnd_allocator.cap = new_cap;
grnd_allocator.states = new_states;
for (size_t i = 0; i < num; ++i)
{
grnd_allocator.states[i] = new_block;
new_block += size_per_each;
}
grnd_allocator.len = num;
}
That's the opposite of a handle, really.
>> But it will constrain future
>> evolution of the implementation because you can't add registration
>> (retaining a reference to the passed-in area in getrandom) after the
>> fact. But I'm not sure if this is possible with the current interface,
>> either. Userspace has to make some assumptions about the life-cycle to
>> avoid a memory leak on thread exit.
>
> It sounds like this is sort of a different angle on Rasmus' earlier
> comment about how munmap leaks implementation details. Maybe there's
> something to that after all? Or not? I see two approaches:
>
> 1) Keep munmap as the allocation function. If later on we do fancy
> registration and in-kernel state tracking, or add fancy protection
> flags, or whatever else, munmap should be able to identify these
> pages and carry out whatever special treatment is necessary.
munmap is fine, but the interface needs to say how to use it, and what
length to pass.
>> > + num_states = clamp_t(size_t, num_hint, 1, (SIZE_MAX & PAGE_MASK) / state_size);
>> > + alloc_size = PAGE_ALIGN(num_states * state_size);
>>
>> Doesn't this waste space for one state if state_size happens to be a
>> power of 2? Why do this SIZE_MAX & PAGE_MASK thing at all? Shouldn't
>> it be PAGE_SIZE / state_size?
>
> The first line is a clamp. That fixes num_hint between 1 and the largest
> number that when multiplied and rounded up won't overflow.
>
> So, if state_size is a power of two, let's say 256, and there's only one
> state, here's what that looks like:
>
> num_states = clamp(1, 1, (0xffffffff & (~(4096 - 1))) / 256 = 16777200) = 1
> alloc_size = PAGE_ALIGN(1 * 256) = 4096
>
> So that seems like it's working as intended, right? Or if not, maybe
> it'd help to write out the digits you're concerned about?
I think I was just confused.
>> > + if (put_user(alloc_size / state_size, num) || put_user(state_size, size_per_each))
>> > + return -EFAULT;
>> > +
>> > + pages_addr = vm_mmap(NULL, 0, alloc_size, PROT_READ | PROT_WRITE,
>> > + MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, 0);
>>
>> I think Rasmus has already raised questions about MAP_LOCKED.
>>
>> I think the kernel cannot rely on it because userspace could call
>> munlock on the allocation.
>
> Then they're caught holding the bag? This doesn't seem much different
> from userspace shooting themselves in general, like writing garbage into
> the allocated states and then trying to use them. If this is something
> you really, really are concerned about, then maybe my cheesy dumb xor
> thing mentioned above would be a low effort mitigation here.
So the MAP_LOCKED is just there to prevent leakage to swap?
Thanks,
Florian