Re: [PATCH v4 1/3] rust: block: introduce `kernel::block::mq` module

From: Benno Lossin
Date: Mon Jun 03 2024 - 14:26:36 EST


On 03.06.24 14:01, Andreas Hindborg wrote:
> Benno Lossin <benno.lossin@xxxxxxxxx> writes:
>> On 01.06.24 15:40, Andreas Hindborg wrote:
>>> +impl seal::Sealed for Initialized {}
>>> +impl GenDiskState for Initialized {
>>> + const DELETE_ON_DROP: bool = false;
>>> +}
>>> +impl seal::Sealed for Added {}
>>> +impl GenDiskState for Added {
>>> + const DELETE_ON_DROP: bool = true;
>>> +}
>>> +
>>> +impl<T: Operations> GenDisk<T, Initialized> {
>>> + /// Try to create a new `GenDisk`.
>>> + pub fn try_new(tagset: Arc<TagSet<T>>) -> Result<Self> {
>>
>> Since there is no non-try `new` function, I think we should name this
>> function just `new`.
>
> Right, I am still getting used to the new naming scheme. Do you know if
> it is documented anywhere?

I don't think it is documented, it might only be a verbal convention at
the moment. Although [1] is suggesting `new` for general constructors.
Since this is the only constructor, one could argue that the
recommendation is to use `new` (which I personally find a good idea).

[1]: https://rust-lang.github.io/api-guidelines/naming.html

[...]

>>> +impl<T: Operations> OperationsVTable<T> {
>>> + /// This function is called by the C kernel. A pointer to this function is
>>> + /// installed in the `blk_mq_ops` vtable for the driver.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// - The caller of this function must ensure `bd` is valid
>>> + /// and initialized. The pointees must outlive this function.
>>
>> Until when do the pointees have to be alive? "must outlive this
>> function" could also be the case if the pointees die immediately after
>> this function returns.
>
> It should not be plural. What I intended to communicate is that what
> `bd` points to must be valid for read for the duration of the function
> call. I think that is what "The pointee must outlive this function"
> states? Although when we talk about lifetime of an object pointed to by
> a pointer, I am not sure about the correct way to word this. Do we talk
> about the lifetime of the pointer or the lifetime of the pointed to
> object (the pointee). We should not use the same wording for the pointer
> and the pointee.
>
> How about:
>
> /// - The caller of this function must ensure that the pointee of `bd` is
> /// valid for read for the duration of this function.

But this is not enough for it to be sound, right? You create an `ARef`
from `bd.rq`, which potentially lives forever. You somehow need to
require that the pointer `bd` stays valid for reads and (synchronized)
writes until the request is ended (probably via `blk_mq_end_request`).

>>> + /// - This function must not be called with a `hctx` for which
>>> + /// `Self::exit_hctx_callback()` has been called.
>>> + /// - (*bd).rq must point to a valid `bindings:request` for which
>>> + /// `OperationsVTable<T>::init_request_callback` was called
>>
>> Missing `.` at the end.
>
> Thanks.
>
>>
>>> + unsafe extern "C" fn queue_rq_callback(
>>> + _hctx: *mut bindings::blk_mq_hw_ctx,
>>> + bd: *const bindings::blk_mq_queue_data,
>>> + ) -> bindings::blk_status_t {
>>> + // SAFETY: `bd.rq` is valid as required by the safety requirement for
>>> + // this function.
>>> + let request = unsafe { &*(*bd).rq.cast::<Request<T>>() };
>>> +
>>> + // One refcount for the ARef, one for being in flight
>>> + request.wrapper_ref().refcount().store(2, Ordering::Relaxed);
>>> +
>>> + // SAFETY: We own a refcount that we took above. We pass that to `ARef`.
>>> + // By the safety requirements of this function, `request` is a valid
>>> + // `struct request` and the private data is properly initialized.
>>> + let rq = unsafe { Request::aref_from_raw((*bd).rq) };
>>
>> I think that you need to require that the request is alive at least
>> until `blk_mq_end_request` is called for the request (since at that
>> point all `ARef`s will be gone).
>> Also if this is not guaranteed, the safety requirements of
>> `AlwaysRefCounted` are violated (since the object can just disappear
>> even if it has refcount > 0 [the refcount refers to the Rust refcount in
>> the `RequestDataWrapper`, not the one in C]).
>
> Yea, for the last invariant of `Request`:
>
> /// * `self` is reference counted by atomic modification of
> /// self.wrapper_ref().refcount().
>
> I will add this to the safety comment at the call site:
>
> // - `rq` will be alive until `blk_mq_end_request` is called and is
> // reference counted by `ARef` until then.

Seems like you already want to use this here :)

[...]

>>> + /// This function is called by the C kernel. A pointer to this function is
>>> + /// installed in the `blk_mq_ops` vtable for the driver.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// This function may only be called by blk-mq C infrastructure. `set` must

`set` doesn't exist (`_set` does), you are also not using this
requirement.

>>> + /// point to an initialized `TagSet<T>`.
>>> + unsafe extern "C" fn init_request_callback(
>>> + _set: *mut bindings::blk_mq_tag_set,
>>> + rq: *mut bindings::request,
>>> + _hctx_idx: core::ffi::c_uint,
>>> + _numa_node: core::ffi::c_uint,
>>> + ) -> core::ffi::c_int {
>>> + from_result(|| {
>>> + // SAFETY: The `blk_mq_tag_set` invariants guarantee that all
>>> + // requests are allocated with extra memory for the request data.
>>
>> What guarantees that the right amount of memory has been allocated?
>> AFAIU that is guaranteed by the `TagSet` (but there is no invariant).
>
> It is by C API contract. `TagSet`::try_new` (now `new`) writes
> `cmd_size` into the `struct blk_mq_tag_set`. That is picked up by
> `blk_mq_alloc_tag_set` to allocate the right amount of space for each request.
>
> The invariant here is on the C type. Perhaps the wording is wrong. I am
> not exactly sure how to express this. How about this:
>
> // SAFETY: We instructed `blk_mq_alloc_tag_set` to allocate requests
> // with extra memory for the request data when we called it in
> // `TagSet::new`.

I think you need a safety requirement on the function: `rq` points to a
valid `Request`. Then you could just use `Request::wrapper_ptr` instead
of the line below.

>>> + let pdu = unsafe { bindings::blk_mq_rq_to_pdu(rq) }.cast::<RequestDataWrapper>();
>>> +
>>> + // SAFETY: The refcount field is allocated but not initialized, this
>>> + // valid for write.
>>> + unsafe { RequestDataWrapper::refcount_ptr(pdu).write(AtomicU64::new(0)) };
>>> +
>>> + Ok(0)
>>> + })
>>> + }
>>
>> [...]
>>
>>> + /// Notify the block layer that a request is going to be processed now.
>>> + ///
>>> + /// The block layer uses this hook to do proper initializations such as
>>> + /// starting the timeout timer. It is a requirement that block device
>>> + /// drivers call this function when starting to process a request.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// The caller must have exclusive ownership of `self`, that is
>>> + /// `self.wrapper_ref().refcount() == 2`.
>>> + pub(crate) unsafe fn start_unchecked(this: &ARef<Self>) {
>>> + // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>>> + // existence of `&mut self` we have exclusive access.
>>
>> We don't have a `&mut self`. But the safety requirements ask for a
>> unique `ARef`.
>
> Thanks, I'll rephrase to:
>
> // SAFETY: By type invariant, `self.0` is a valid `struct request` and
> // we have exclusive access.
>
>>
>>> + unsafe { bindings::blk_mq_start_request(this.0.get()) };
>>> + }
>>> +
>>> + fn try_set_end(this: ARef<Self>) -> Result<ARef<Self>, ARef<Self>> {
>>> + // We can race with `TagSet::tag_to_rq`
>>> + match this.wrapper_ref().refcount().compare_exchange(
>>> + 2,
>>> + 0,
>>> + Ordering::Relaxed,
>>> + Ordering::Relaxed,
>>> + ) {
>>> + Err(_old) => Err(this),
>>> + Ok(_) => Ok(this),
>>> + }
>>> + }
>>> +
>>> + /// Notify the block layer that the request has been completed without errors.
>>> + ///
>>> + /// This function will return `Err` if `this` is not the only `ARef`
>>> + /// referencing the request.
>>> + pub fn end_ok(this: ARef<Self>) -> Result<(), ARef<Self>> {
>>> + let this = Self::try_set_end(this)?;
>>> + let request_ptr = this.0.get();
>>> + core::mem::forget(this);
>>> +
>>> + // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>>> + // existence of `&mut self` we have exclusive access.
>>
>> Same here, but in this case, the `ARef` is unique, since you called
>> `try_set_end`. You could make it a `# Guarantee` of `try_set_end`: "If
>> `Ok(aref)` is returned, then the `aref` is unique."
>
> Makes sense. I have not seen `# Guarantee` used anywhere. Do you have a link for that use?

Alice used it a couple of times, eg in [2]. I plan on putting it in the
safety standard.

[2]: https://lore.kernel.org/rust-for-linux/20230601134946.3887870-2-aliceryhl@xxxxxxxxxx/

---
Cheers,
Benno