Re: [PATCH v6 13/26] rust: alloc: implement kernel `Vec` type

From: Benno Lossin
Date: Wed Sep 11 2024 - 04:46:52 EST


On 11.09.24 02:18, Danilo Krummrich wrote:
> On Tue, Sep 10, 2024 at 07:32:06PM +0000, Benno Lossin wrote:
>> On 16.08.24 02:10, Danilo Krummrich wrote:
>>> + ($elem:expr; $n:expr) => (
>>> + {
>>> + $crate::alloc::KVec::from_elem($elem, $n, GFP_KERNEL)
>>
>> Hmm, could it be that one might want to use `kvec!` without
>> `GFP_KERNEL`?
>> Or add additional flags, eg __GFP_ZERO?
>
> Pretty unlikely, I'd say. __GFP_ZERO in particular wouldn't make much sense,
> since the macro enforces initialization anyways.
>
> Maybe something like GFP_ATOMIC, but I expect this to be uncommon enough to not
> consider this for this macro for now.

SGTM

>> If you think that supporting this is not immediately necessary or if
>> this is too much for this patchset, then we can also postpone it (maybe
>> it could be a good-first-issue). Do you keep a list of future
>> improvements for the new allocator API somewhere? If not, then I think
>> we should create one (best place would be the issue tracker on GH).
>
> I would only add it if it turns out to be a common need. As mentioned, I don't
> expect it to be.
>
> I'd rather keep issues in the source tree. For instance, DRM has them in
> '/Documentation/gpu/todo.rst'.

But then you need to submit patches to change them... That sounds like a
hassle and we already have the precedent to use the github issue
tracker. It is also much better for discoverability for people outside
of the kernel.

>>> + }
>>> + );
>>> + ($($x:expr),+ $(,)?) => (
>>> + {
>>> + match $crate::alloc::KBox::new_uninit(GFP_KERNEL) {
>>> + Ok(b) => Ok($crate::alloc::KVec::from($crate::alloc::KBox::write(b, [$($x),+]))),
>>> + Err(e) => Err(e),
>>> + }
>>> + }
>>> + );
>>> +}
>>> +
>>> +/// The kernel's [`Vec`] type.
>>> +///
>>> +/// A contiguous growable array type with contents allocated with the kernel's allocators (e.g.
>>> +/// `Kmalloc`, `Vmalloc` or `KVmalloc`), written `Vec<T, A>`.
>>
>> New folks might get confused as to which allocator they should choose.
>> Do we have a sensible default for `Vec`?
>
> Then they should read the documentation about the kernel's allocators. We
> already link them in rust/kernel/alloc/allocator.rs.
>
>> If yes, then we at least should document that here. We might also want
>> to make it the default value for the generic parameter.
>> This is also a good idea for `Box`.
>
> If we really want a default it should be `Kmalloc`, but I really think we should
> force people to make an explicit decision and think about it and don't just go
> with whatever default.
>
> It makes it also easier to grep for things. :)

SGTM

>>> +///
>>> +/// For non-zero-sized values, a [`Vec`] will use the given allocator `A` for its allocation. For
>>> +/// the most common allocators the type aliases `KVec`, `VVec` and `KVVec` exist.
>>> +///
>>> +/// For zero-sized types the [`Vec`]'s pointer must be `dangling_mut::<T>`; no memory is allocated.
>>> +///
>>> +/// Generally, [`Vec`] consists of a pointer that represents the vector's backing buffer, the
>>> +/// capacity of the vector (the number of elements that currently fit into the vector), it's length
>>> +/// (the number of elements that are currently stored in the vector) and the `Allocator` type used
>>> +/// to allocate (and free) the backing buffer.
>>> +///
>>> +/// A [`Vec`] can be deconstructed into and (re-)constructed from it's previously named raw parts
>>> +/// and manually modified.
>>> +///
>>> +/// [`Vec`]'s backing buffer gets, if required, automatically increased (re-allocated) when elements
>>> +/// are added to the vector.
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// The [`Vec`] backing buffer's pointer is always properly aligned and either points to memory
>>> +/// allocated with `A` or, for zero-sized types, is a dangling pointer.
>>
>> Just use `self.ptr` instead of "backing buffer".
>>
>>> +///
>>> +/// The length of the vector always represents the exact number of elements stored in the vector.
>>
>> Same here, `self.len`.
>>
>>> +///
>>> +/// The capacity of the vector always represents the absolute number of elements that can be stored
>>
>> Ditto.
>>
>>> +/// within the vector without re-allocation. However, it is legal for the backing buffer to be
>>> +/// larger than `size_of<T>` times the capacity.
>>> +///
>>> +/// The `Allocator` type `A` of the vector is the exact same `Allocator` type the backing buffer was
>>> +/// allocated with (and must be freed with).
>>
>> Please turn this into a bullet-point list.
>
> Is this a rule? Do we want to make it one?

I am trying to make it one with my safety standard.

>>> +pub struct Vec<T, A: Allocator> {
>>> + ptr: NonNull<T>,
>>> + /// Represents the actual buffer size as `cap` times `size_of::<T>` bytes.
>>> + ///
>>> + /// Note: This isn't quite the same as `Self::capacity`, which in contrast returns the number of
>>> + /// elements we can still store without reallocating.
>>> + ///
>>> + /// # Invariants
>>> + ///
>>> + /// `cap` must be in the `0..=isize::MAX` range.
>>> + cap: usize,
>>> + len: usize,
>>> + _p: PhantomData<A>,
>>> +}



>>> + /// Creates a new, empty Vec<T, A>.
>>> + ///
>>> + /// This method does not allocate by itself.
>>> + #[inline]
>>> + pub const fn new() -> Self {
>>> + Self {
>>> + ptr: NonNull::dangling(),
>>> + cap: 0,
>>> + len: 0,
>>> + _p: PhantomData::<A>,
>>> + }
>>> + }
>>> +
>>> + /// Returns a slice of `MaybeUninit<T>` for the remaining spare capacity of the vector.
>>
>> Returns
>
> Hm?

Forgot to finish that sentence, since I couldn't really pinpoint what
exactly I wanted to change. Just ignore it.

>>> + pub fn spare_capacity_mut(&mut self) -> &mut [MaybeUninit<T>] {
>>> + // SAFETY: The memory between `self.len` and `self.capacity` is guaranteed to be allocated
>>> + // and valid, but uninitialized.
>>> + unsafe {
>>> + slice::from_raw_parts_mut(
>>> + self.as_mut_ptr().add(self.len) as *mut MaybeUninit<T>,
>>> + self.capacity() - self.len,
>>> + )
>>> + }



>>> + ///
>>> + /// Otherwise:
>>> + ///
>>> + /// - `ptr` must have been allocated with the allocator `A`.
>>> + /// - `ptr` must satisfy or exceed the alignment requirements of `T`.
>>> + /// - `ptr` must point to memory with a size of at least `size_of::<T>` times the `capacity`
>>> + /// bytes.
>>
>> Just write "`size_of::<T>() * capacity` bytes".
>>
>>> + /// - The allocated size in bytes must not be larger than `isize::MAX`.
>>
>> Should we make this a safety requirement of `Allocator`? I think this is
>> generally the maximum size other allocated things can have anyways.
>
>
>
>>
>>> + /// - `length` must be less than or equal to `capacity`.
>>> + /// - The first `length` elements must be initialized values of type `T`.
>>> + ///
>>> + /// It is also valid to create an empty `Vec` passing a dangling pointer for `ptr` and zero for
>>> + /// `cap` and `len`.
>>
>> Can you make this last sentence part of the `if` chain that you have
>> above (ie the one started with "If `T` is a ZST:").
>
> But `T` isn't necessarily a ZST, but it may be.

I originally meant adding an "else if" part that checks for empty
capacity. But you could just add that to the if at the top ie "If
`capacity == 0` or `T` is a ZST".

>> Just to experiment with the suggestion from Kangrejos to use Rust as the
>> language for safety documentation, here is what it could look like (we
>> should discuss it more before we start using it):
>>
>> /// # Safety
>> ///
>> /// ```ignore
>> /// assert!(ptr.is_aligned_to(align_of::<T>()));
>> /// assert!(!ptr.is_null());
>> /// assert!(length <= capacity);
>> /// if capacity != 0 && size_of::<T>() != 0 {
>> /// assert!(A::did_allocate(ptr));
>> /// assert!(size_of::<T>() * capacity <= A::layout_of(ptr).len());
>> /// assert!(is_initialized(ptr::slice_from_raw_parts(ptr, length)));
>> /// }
>> /// ```
>>
>> I really like how this looks! We might want to add labels/names to each
>> of the conditions and then one could use those in the justifications. A
>> tool could then read those and match them to the requirements of the
>> unsafe operation.
>
> I need to think about this a bit more, but at a first glance I think I like it.
>
> The tool would ideally be the compiler itself...

Yes! There is the contracts draft PR that might add support for
this: https://github.com/rust-lang/rust/pull/128045

---
Cheers,
Benno