Re: [RFC PATCH 1/5] rust: block: introduce `kernel::block::mq` module
From: Andreas Hindborg
Date: Tue Apr 02 2024 - 08:35:28 EST
Benno Lossin <benno.lossin@xxxxxxxxx> writes:
> On 3/13/24 12:05, Andreas Hindborg wrote:
>> From: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
>>
>> Add initial abstractions for working with blk-mq.
>>
>> This patch is a maintained, refactored subset of code originally published by
>> Wedson Almeida Filho <wedsonaf@xxxxxxxxx> [1].
>>
>> [1] https://github.com/wedsonaf/linux/tree/f2cfd2fe0e2ca4e90994f96afe268bbd4382a891/rust/kernel/blk/mq.rs
>>
>> Cc: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
>> Signed-off-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
>> ---
>> block/blk-mq.c | 3 +-
>> include/linux/blk-mq.h | 1 +
>> rust/bindings/bindings_helper.h | 2 +
>> rust/helpers.c | 45 ++++
>> rust/kernel/block.rs | 5 +
>> rust/kernel/block/mq.rs | 131 +++++++++++
>> rust/kernel/block/mq/gen_disk.rs | 174 +++++++++++++++
>> rust/kernel/block/mq/operations.rs | 346 +++++++++++++++++++++++++++++
>> rust/kernel/block/mq/raw_writer.rs | 60 +++++
>> rust/kernel/block/mq/request.rs | 182 +++++++++++++++
>> rust/kernel/block/mq/tag_set.rs | 117 ++++++++++
>> rust/kernel/error.rs | 5 +
>> rust/kernel/lib.rs | 1 +
>> 13 files changed, 1071 insertions(+), 1 deletion(-)
>
> Do you think that it's possible to split this into smaller
> patches? It would make review a lot easier.
Probably, I'll look into that.
>
>> create mode 100644 rust/kernel/block.rs
>> create mode 100644 rust/kernel/block/mq.rs
>> create mode 100644 rust/kernel/block/mq/gen_disk.rs
>> create mode 100644 rust/kernel/block/mq/operations.rs
>> create mode 100644 rust/kernel/block/mq/raw_writer.rs
>> create mode 100644 rust/kernel/block/mq/request.rs
>> create mode 100644 rust/kernel/block/mq/tag_set.rs
>
> [...]
>
>> diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs
>> new file mode 100644
>> index 000000000000..4c93317a568a
>> --- /dev/null
>> +++ b/rust/kernel/block.rs
>> @@ -0,0 +1,5 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Types for working with the block layer
>
> Missing '.'.
👍
>
>> +
>> +pub mod mq;
>> diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
>> new file mode 100644
>> index 000000000000..08de1cc114ff
>> --- /dev/null
>> +++ b/rust/kernel/block/mq.rs
>> @@ -0,0 +1,131 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! This module provides types for implementing block drivers that interface the
>> +//! blk-mq subsystem.
>> +//!
>> +//! To implement a block device driver, a Rust module must do the following:
>> +//!
>> +//! - Implement [`Operations`] for a type `T`
>
> I think it would be nicer to use `Driver` (or `MyBlkDevice`) instead of
> `T`.
I think I like `T` better, because it signals placeholder more
effectively.
>
>> +//! - Create a [`TagSet<T>`]
>> +//! - Create a [`GenDisk<T>`], passing in the `TagSet` reference
>> +//! - Add the disk to the system by calling [`GenDisk::add`]
>> +//!
>> +//! The types available in this module that have direct C counterparts are:
>> +//!
>> +//! - The `TagSet` type that abstracts the C type `struct tag_set`.
>> +//! - The `GenDisk` type that abstracts the C type `struct gendisk`.
>> +//! - The `Request` type that abstracts the C type `struct request`.
>> +//!
>> +//! Many of the C types that this module abstracts allow a driver to carry
>> +//! private data, either embedded in the stuct directly, or as a C `void*`. In
>> +//! these abstractions, this data is typed. The types of the data is defined by
>> +//! associated types in `Operations`, see [`Operations::RequestData`] for an
>> +//! example.
>> +//!
>> +//! The kernel will interface with the block evice driver by calling the method
>
> Typo: "block evice driver"
Thanks.
>
>> +//! implementations of the `Operations` trait.
>> +//!
>> +//! IO requests are passed to the driver as [`Request`] references. The
>> +//! `Request` type is a wrapper around the C `struct request`. The driver must
>> +//! mark start of request processing by calling [`Request::start`] and end of
>> +//! processing by calling one of the [`Request::end`], methods. Failure to do so
>> +//! can lead to IO failures.
>
> I am unfamiliar with this, what are "IO failures"?
> Do you think that it might be better to change the API to use a
> callback? So instead of calling start and end, you would do
>
> request.handle(|req| {
> // do the stuff that would be done between start and end
> });
>
> I took a quick look at the rnull driver and there you are calling
> `Request::end_ok` from a different function. So my suggestion might not
> be possible, since you really need the freedom.
>
> Do you think that a guard approach might work better? ie `start` returns
> a guard that when dropped will call `end` and you need the guard to
> operate on the request.
I don't think that would fit, since the driver might not complete the
request immediately. We might be able to call `start` on behalf of the
driver.
At any rate, since the request is reference counted now, we can
automatically fail a request when the last reference is dropped and it
was not marked successfully completed. I would need to measure the
performance implications of such a feature.
The comment needs update too. Failure to complete requests will lead to
either deadlock or timeout errors, depending on configuration.
>
>> +//!
>> +//! The `TagSet` is responsible for creating and maintaining a mapping between
>> +//! `Request`s and integer ids as well as carrying a pointer to the vtable
>> +//! generated by `Operations`. This mapping is useful for associating
>> +//! completions from hardware with the correct `Request` instance. The `TagSet`
>> +//! determines the maximum queue depth by setting the number of `Request`
>> +//! instances available to the driver, and it determines the number of queues to
>> +//! instantiate for the driver. If possible, a driver should allocate one queue
>> +//! per core, to keep queue data local to a core.
>> +//!
>> +//! One `TagSet` instance can be shared between multiple `GenDisk` instances.
>> +//! This can be useful when implementing drivers where one piece of hardware
>> +//! with one set of IO resources are represented to the user as multiple disks.
>> +//!
>> +//! One significant difference between block device drivers implemented with
>> +//! these Rust abstractions and drivers implemented in C, is that the Rust
>> +//! drivers have to own a reference count on the `Request` type when the IO is
>> +//! in flight. This is to ensure that the C `struct request` instances backing
>> +//! the Rust `Request` instances are live while the Rust driver holds a
>> +//! reference to the `Request`. In addition, the conversion of an ineger tag to
>
> Typo: "of an ineger tag"
Thanks. Looks like I need to properly configure my editor to spell check
in comments.
>
>> +//! a `Request` via the `TagSet` would not be sound without this bookkeeping.
>> +//!
>> +//! # ⚠ Note
>> +//!
>> +//! For Rust block device drivers, the point in time where a request
>> +//! is freed and made available for recycling is usualy at the point in time
>> +//! when the last `ARef<Request>` is dropped. For C drivers, this event usually
>> +//! occurs when `bindings::blk_mq_end_request` is called.
>> +//!
>> +//! # Example
>> +//!
>> +//! ```rust
>> +//! use kernel::{
>> +//! block::mq::*,
>> +//! new_mutex,
>> +//! prelude::*,
>> +//! sync::{Arc, Mutex},
>> +//! types::{ARef, ForeignOwnable},
>> +//! };
>> +//!
>> +//! struct MyBlkDevice;
>> +//!
>> +//! #[vtable]
>> +//! impl Operations for MyBlkDevice {
>> +//! type RequestData = ();
>> +//! type RequestDataInit = impl PinInit<()>;
>> +//! type QueueData = ();
>> +//! type HwData = ();
>> +//! type TagSetData = ();
>> +//!
>> +//! fn new_request_data(
>> +//! _tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
>> +//! ) -> Self::RequestDataInit {
>> +//! kernel::init::zeroed()
>> +//! }
>> +//!
>> +//! fn queue_rq(_hw_data: (), _queue_data: (), rq: ARef<Request<Self>>, _is_last: bool) -> Result {
>> +//! rq.start();
>> +//! rq.end_ok();
>> +//! Ok(())
>> +//! }
>> +//!
>> +//! fn commit_rqs(
>> +//! _hw_data: <Self::HwData as ForeignOwnable>::Borrowed<'_>,
>> +//! _queue_data: <Self::QueueData as ForeignOwnable>::Borrowed<'_>,
>> +//! ) {
>> +//! }
>> +//!
>> +//! fn complete(rq: &Request<Self>) {
>> +//! rq.end_ok();
>> +//! }
>> +//!
>> +//! fn init_hctx(
>> +//! _tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
>> +//! _hctx_idx: u32,
>> +//! ) -> Result<Self::HwData> {
>> +//! Ok(())
>> +//! }
>> +//! }
>> +//!
>> +//! let tagset: Arc<TagSet<MyBlkDevice>> = Arc::pin_init(TagSet::try_new(1, (), 256, 1))?;
>> +//! let mut disk = GenDisk::try_new(tagset, ())?;
>> +//! disk.set_name(format_args!("myblk"))?;
>> +//! disk.set_capacity_sectors(4096);
>> +//! disk.add()?;
>> +//!
>> +//! # Ok::<(), kernel::error::Error>(())
>> +//! ```
>
> This piece of documentation is **really** valuable, thanks a lot for
> taking the time to write it.
Great!
>
>> +
>> +mod gen_disk;
>> +mod operations;
>> +mod raw_writer;
>> +mod request;
>> +mod tag_set;
>> +
>> +pub use gen_disk::GenDisk;
>> +pub use operations::Operations;
>> +pub use request::{Request, RequestDataRef};
>> +pub use tag_set::TagSet;
>> diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
>> new file mode 100644
>> index 000000000000..b7845fc9e39f
>> --- /dev/null
>> +++ b/rust/kernel/block/mq/gen_disk.rs
>> @@ -0,0 +1,174 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Generic disk abstraction.
>> +//!
>> +//! C header: [`include/linux/blkdev.h`](srctree/include/linux/blkdev.h)
>> +//! C header: [`include/linux/blk_mq.h`](srctree/include/linux/blk_mq.h)
>> +
>> +use crate::block::mq::{raw_writer::RawWriter, Operations, TagSet};
>> +use crate::{
>> + bindings, error::from_err_ptr, error::Result, sync::Arc, types::ForeignOwnable,
>> + types::ScopeGuard,
>> +};
>> +use core::fmt::{self, Write};
>> +
>> +/// A generic block device
>> +///
>> +/// # Invariants
>> +///
>> +/// - `gendisk` must always point to an initialized and valid `struct gendisk`.
>> +pub struct GenDisk<T: Operations> {
>> + _tagset: Arc<TagSet<T>>,
>> + gendisk: *mut bindings::gendisk,
>> +}
>> +
>> +// SAFETY: `GenDisk` is an owned pointer to a `struct gendisk` and an `Arc` to a
>> +// `TagSet` It is safe to send this to other threads as long as T is Send.
>> +unsafe impl<T: Operations + Send> Send for GenDisk<T> {}
>> +
>> +impl<T: Operations> GenDisk<T> {
>> + /// Try to create a new `GenDisk`
>> + pub fn try_new(tagset: Arc<TagSet<T>>, queue_data: T::QueueData) -> Result<Self> {
>> + let data = queue_data.into_foreign();
>> + let recover_data = ScopeGuard::new(|| {
>> + // SAFETY: T::QueueData was created by the call to `into_foreign()` above
>> + unsafe { T::QueueData::from_foreign(data) };
>> + });
>> +
>> + let lock_class_key = crate::sync::LockClassKey::new();
>> +
>> + // SAFETY: `tagset.raw_tag_set()` points to a valid and initialized tag set
>> + let gendisk = from_err_ptr(unsafe {
>> + bindings::__blk_mq_alloc_disk(
>> + tagset.raw_tag_set(),
>> + data.cast_mut(),
>> + lock_class_key.as_ptr(),
>> + )
>> + })?;
>> +
>> + const TABLE: bindings::block_device_operations = bindings::block_device_operations {
>> + submit_bio: None,
>> + open: None,
>> + release: None,
>> + ioctl: None,
>> + compat_ioctl: None,
>> + check_events: None,
>> + unlock_native_capacity: None,
>> + getgeo: None,
>> + set_read_only: None,
>> + swap_slot_free_notify: None,
>> + report_zones: None,
>> + devnode: None,
>> + alternative_gpt_sector: None,
>> + get_unique_id: None,
>> + // TODO: Set to THIS_MODULE. Waiting for const_refs_to_static feature to be merged
>> + // https://github.com/rust-lang/rust/issues/119618
>> + owner: core::ptr::null_mut(),
>> + pr_ops: core::ptr::null_mut(),
>> + free_disk: None,
>> + poll_bio: None,
>> + };
>> +
>> + // SAFETY: gendisk is a valid pointer as we initialized it above
>> + unsafe { (*gendisk).fops = &TABLE };
>> +
>> + recover_data.dismiss();
>> + Ok(Self {
>> + _tagset: tagset,
>> + gendisk,
>
> Missing INVARIANT comment.
Will fix.
>
>> + })
>> + }
>> +
>> + /// Set the name of the device
>
> Missing '.'.
Thanks.
>
>> + pub fn set_name(&mut self, args: fmt::Arguments<'_>) -> Result {
>> + let mut raw_writer = RawWriter::from_array(
>> + // SAFETY: By type invariant `self.gendisk` points to a valid and initialized instance
>> + unsafe { &mut (*self.gendisk).disk_name },
>
> To create a `&mut` reference, you need exclusive access, it should be
> sufficient to add to the invariant that `gendisk` is owned/unique.
Hmm, we don't actually _always_ have unique ownership of this string
buffer. I will change the API to only allow configuration of the
instance before it is hooked in. Thanks for spotting this.
>
>> + );
>> + raw_writer.write_fmt(args)?;
>> + raw_writer.write_char('\0')?;
>> + Ok(())
>> + }
>
> [...]
>
>> +impl<T: Operations> Drop for GenDisk<T> {
>> + fn drop(&mut self) {
>> + // SAFETY: By type invariant of `Self`, `self.gendisk` points to a valid
>> + // and initialized instance of `struct gendisk`. As such, `queuedata`
>> + // was initialized by the initializer returned by `try_new` with a call
>> + // to `ForeignOwnable::into_foreign`.
>
> This should also be an invariant of `GenDisk`.
Ok.
>
>> + let queue_data = unsafe { (*(*self.gendisk).queue).queuedata };
>> +
>> + // SAFETY: By type invariant, `self.gendisk` points to a valid and
>> + // initialized instance of `struct gendisk`
>> + unsafe { bindings::del_gendisk(self.gendisk) };
>> +
>> + // SAFETY: `queue.queuedata` was created by `GenDisk::try_new()` with a
>> + // call to `ForeignOwnable::into_pointer()` to create `queuedata`.
>> + // `ForeignOwnable::from_foreign()` is only called here.
>> + let _queue_data = unsafe { T::QueueData::from_foreign(queue_data) };
>> + }
>> +}
>> diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
>> new file mode 100644
>> index 000000000000..53c6ad663208
>> --- /dev/null
>> +++ b/rust/kernel/block/mq/operations.rs
>> @@ -0,0 +1,346 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! This module provides an interface for blk-mq drivers to implement.
>> +//!
>> +//! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h)
>> +
>> +use crate::{
>> + bindings,
>> + block::mq::Request,
>> + error::{from_result, Result},
>> + init::PinInit,
>> + types::{ARef, ForeignOwnable},
>> +};
>> +use core::{marker::PhantomData, ptr::NonNull};
>> +
>> +use super::TagSet;
>> +
>> +/// Implement this trait to interface blk-mq as block devices
>> +#[macros::vtable]
>> +pub trait Operations: Sized {
>> + /// Data associated with a request. This data is located next to the request
>> + /// structure.
>> + ///
>> + /// To be able to handle accessing this data from interrupt context, this
>> + /// data must be `Sync`.
>> + type RequestData: Sized + Sync;
>> +
>> + /// Initializer for `Self::RequestDta`. Used to initialize private data area
>> + /// when requst structure is allocated.
>> + type RequestDataInit: PinInit<Self::RequestData>;
>
> Just to let you know, this dance with the associated types is not needed
> any longer. RPITIT (return position impl trait in trait) has been
> stabilized in 1.75 and you should be able to just write this:
>
> fn new_request_data(
> //rq: ARef<Request<Self>>,
> tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
> ) -> impl PinInit<Self::RequestData>;
>
Thanks 👍
>
>> +
>> + /// Data associated with the `struct request_queue` that is allocated for
>> + /// the `GenDisk` associated with this `Operations` implementation.
>> + type QueueData: ForeignOwnable;
>> +
>> + /// Data associated with a dispatch queue. This is stored as a pointer in
>> + /// the C `struct blk_mq_hw_ctx` that represents a hardware queue.
>> + type HwData: ForeignOwnable;
>> +
>> + /// Data associated with a `TagSet`. This is stored as a pointer in `struct
>> + /// blk_mq_tag_set`.
>> + type TagSetData: ForeignOwnable;
>> +
>> + /// Called by the kernel to get an initializer for a `Pin<&mut RequestData>`.
>> + fn new_request_data(
>> + //rq: ARef<Request<Self>>,
>> + tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
>
> Since you need to use this pattern a lot, it might be a good idea to
> introduce a type alias to help with this:
>
> type ForeignBorrowed<'a, T> = <T as ForeignOwnable>::Borrowed<'a>;
>
> What do the others think?
>
> The function would then become (with the RPITIT improvement as well):
>
> fn new_request_data(
> //rq: ARef<Request<Self>>,
> tagset_data: ForeignBorrowed<'_, Self::TagSetData>,
> ) -> impl PinInit<Self::RequestData>;
A bit more concise, I think it is better. I'll go ahead and kill that
commented out argument that sneaked into the patch as well :)
>
>
>> + ) -> Self::RequestDataInit;
>> +
>> + /// Called by the kernel to queue a request with the driver. If `is_last` is
>> + /// `false`, the driver is allowed to defer commiting the request.
>> + fn queue_rq(
>> + hw_data: <Self::HwData as ForeignOwnable>::Borrowed<'_>,
>> + queue_data: <Self::QueueData as ForeignOwnable>::Borrowed<'_>,
>> + rq: ARef<Request<Self>>,
>> + is_last: bool,
>> + ) -> Result;
>> +
>> + /// Called by the kernel to indicate that queued requests should be submitted
>> + fn commit_rqs(
>> + hw_data: <Self::HwData as ForeignOwnable>::Borrowed<'_>,
>> + queue_data: <Self::QueueData as ForeignOwnable>::Borrowed<'_>,
>> + );
>> +
>> + /// Called by the kernel when the request is completed
>> + fn complete(_rq: &Request<Self>);
>> +
>> + /// Called by the kernel to allocate and initialize a driver specific hardware context data
>> + fn init_hctx(
>> + tagset_data: <Self::TagSetData as ForeignOwnable>::Borrowed<'_>,
>> + hctx_idx: u32,
>> + ) -> Result<Self::HwData>;
>> +
>> + /// Called by the kernel to poll the device for completed requests. Only
>> + /// used for poll queues.
>> + fn poll(_hw_data: <Self::HwData as ForeignOwnable>::Borrowed<'_>) -> bool {
>> + crate::build_error(crate::error::VTABLE_DEFAULT_ERROR)
>> + }
>> +
>> + /// Called by the kernel to map submission queues to CPU cores.
>> + fn map_queues(_tag_set: &TagSet<Self>) {
>> + crate::build_error(crate::error::VTABLE_DEFAULT_ERROR)
>> + }
>> +
>> + // There is no need for exit_request() because `drop` will be called.
>
> I think it would be a good idea to mention this in the documentation of
> the trait.
Yes.
>
>> +}
>
> [...]
>
>> diff --git a/rust/kernel/block/mq/raw_writer.rs b/rust/kernel/block/mq/raw_writer.rs
>> new file mode 100644
>> index 000000000000..f7857740af29
>> --- /dev/null
>> +++ b/rust/kernel/block/mq/raw_writer.rs
>> @@ -0,0 +1,60 @@
>> +use core::{
>> + fmt::{self, Write},
>> + marker::PhantomData,
>> +};
>> +
>> +/// A mutable reference to a byte buffer where a string can be written into
>> +///
>> +/// # Invariants
>> +///
>> +/// * `ptr` is not aliased and valid for read and write for `len` bytes
>
> You probably also want to add "for the duration of `'a`".
👍
>
>> +///
>> +pub(crate) struct RawWriter<'a> {
>> + ptr: *mut u8,
>> + len: usize,
>> + _p: PhantomData<&'a ()>,
>> +}
>> +
>> +impl<'a> RawWriter<'a> {
>> + /// Create a new `RawWriter` instance.
>> + ///
>> + /// # Safety
>> + ///
>> + /// * `ptr` must be valid for read and write for `len` consecutive `u8` elements
>> + /// * `ptr` must not be aliased
>> + unsafe fn new(ptr: *mut u8, len: usize) -> RawWriter<'a> {
>> + Self {
>> + ptr,
>> + len,
>> + _p: PhantomData,
>> + }
>> + }
>
> Since this function is not used (except in the function below), what is
> the reason for using a raw pointer?
> I looked in your other patches, but did not find another user, so could
> this be improved by using mutable references?
I am not sure. But I think the code could benefit from getting passed a
mutable slice instead, move the safety comment to the call site. I will
try that.
>
>> +
>> + pub(crate) fn from_array<const N: usize>(a: &'a mut [core::ffi::c_char; N]) -> RawWriter<'a> {
>> + // SAFETY: the buffer of `a` is valid for read and write for at least `N` bytes
>> + unsafe { Self::new(a.as_mut_ptr().cast::<u8>(), N) }
>> + }
>> +}
>> +
>> +impl Write for RawWriter<'_> {
>> + fn write_str(&mut self, s: &str) -> fmt::Result {
>> + let bytes = s.as_bytes();
>> + let len = bytes.len();
>> + if len > self.len {
>> + return Err(fmt::Error);
>> + }
>> +
>> + // SAFETY:
>> + // * `bytes` is valid for reads of `bytes.len()` size because we hold a shared reference to `s`
>> + // * By type invariant `self.ptr` is valid for writes for at lest `self.len` bytes
>> + // * The regions are not overlapping as `ptr` is not aliased
>> + unsafe { core::ptr::copy_nonoverlapping(&bytes[0], self.ptr, len) };
>> +
>> + // SAFETY: By type invariant of `Self`, `ptr` is in bounds of an
>> + // allocation. Also by type invariant, the pointer resulting from this
>> + // addition is also in bounds.
>> + self.ptr = unsafe { self.ptr.add(len) };
>> + self.len -= len;
>> + Ok(())
>> + }
>> +}
>> diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
>> new file mode 100644
>> index 000000000000..b4dacac5e091
>> --- /dev/null
>> +++ b/rust/kernel/block/mq/request.rs
>> @@ -0,0 +1,182 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! This module provides a wrapper for the C `struct request` type.
>> +//!
>> +//! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h)
>> +
>> +use crate::{
>> + bindings,
>> + block::mq::Operations,
>> + error::{Error, Result},
>> + types::{ARef, AlwaysRefCounted, Opaque},
>> +};
>> +use core::{ffi::c_void, marker::PhantomData, ops::Deref};
>> +
>> +/// A wrapper around a blk-mq `struct request`. This represents an IO request.
>> +///
>> +/// # Invariants
>> +///
>> +/// * `self.0` is a valid `struct request` created by the C portion of the kernel
>> +/// * `self` is reference counted. a call to `req_ref_inc_not_zero` keeps the
>> +/// instance alive at least until a matching call to `req_ref_put_and_test`
>> +///
>> +#[repr(transparent)]
>> +pub struct Request<T: Operations>(Opaque<bindings::request>, PhantomData<T>);
>> +
>> +impl<T: Operations> Request<T> {
>> + /// Create a `&mut Request` from a `bindings::request` pointer
>> + ///
>> + /// # Safety
>> + ///
>> + /// * `ptr` must be aligned and point to a valid `bindings::request` instance
>> + /// * Caller must ensure that the pointee of `ptr` is live and owned
>> + /// exclusively by caller for at least `'a`
>> + ///
>> + pub(crate) unsafe fn from_ptr_mut<'a>(ptr: *mut bindings::request) -> &'a mut Self {
>> + // SAFETY:
>> + // * The cast is valid as `Self` is transparent.
>> + // * By safety requirements of this function, the reference will be
>> + // valid for 'a.
>> + unsafe { &mut *(ptr.cast::<Self>()) }
>> + }
>> +
>> + /// Get the command identifier for the request
>> + pub fn command(&self) -> u32 {
>> + // SAFETY: By C API contract and type invariant, `cmd_flags` is valid for read
>> + unsafe { (*self.0.get()).cmd_flags & ((1 << bindings::REQ_OP_BITS) - 1) }
>> + }
>> +
>> + /// Call this to indicate to the kernel that the request has been issued by the driver
>
> I am a bit confused, is this not supposed to signal that the processing
> of the request is going to start now? cf C documentation:
>
> /**
> * blk_mq_start_request - Start processing a request
> * @rq: Pointer to request to be started
> *
> * Function used by device drivers to notify the block layer that a request
> * is going to be processed now, so blk layer can do proper initializations
> * such as starting the timeout timer.
> */
Yes, it is to indicate that the request is now considered in-flight by
the driver. I'll change the wording a bit.
>
>> + pub fn start(&self) {
>> + // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>> + // existence of `&mut self` we have exclusive access.
>> + unsafe { bindings::blk_mq_start_request(self.0.get()) };
>> + }
>> +
>> + /// Call this to indicate to the kernel that the request has been completed without errors
>
> I dislike the "Call this", it feels redundant, what about "Signal the
> block layer that the request has been completed without errors.".
>
> Also with the other functions below.
I agree.
>
>> + pub fn end_ok(&self) {
>> + // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>> + // existence of `&mut self` we have exclusive access.
>> + unsafe { bindings::blk_mq_end_request(self.0.get(), bindings::BLK_STS_OK as _) };
>> + }
>> +
>> + /// Call this to indicate to the kernel that the request completed with an error
>> + pub fn end_err(&self, err: Error) {
>> + // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>> + // existence of `&mut self` we have exclusive access.
>> + unsafe { bindings::blk_mq_end_request(self.0.get(), err.to_blk_status()) };
>> + }
>> +
>> + /// Call this to indicate that the request completed with the status indicated by `status`
>> + pub fn end(&self, status: Result) {
>> + if let Err(e) = status {
>> + self.end_err(e);
>> + } else {
>> + self.end_ok();
>> + }
>> + }
>> +
>> + /// Call this to schedule defered completion of the request
>> + pub fn complete(&self) {
>> + // SAFETY: By type invariant, `self.0` is a valid `struct request`
>> + if !unsafe { bindings::blk_mq_complete_request_remote(self.0.get()) } {
>> + T::complete(self);
>> + }
>> + }
>> +
>> + /// Get the target sector for the request
>> + #[inline(always)]
>> + pub fn sector(&self) -> usize {
>> + // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
>> + unsafe { (*self.0.get()).__sector as usize }
>> + }
>> +
>> + /// Returns an owned reference to the per-request data associated with this
>> + /// request
>> + pub fn owned_data_ref(request: ARef<Self>) -> RequestDataRef<T> {
>> + RequestDataRef::new(request)
>> + }
>> +
>> + /// Returns a reference to the oer-request data associated with this request
>
> Typo: "the oer-request"
Thanks.
>
>> + pub fn data_ref(&self) -> &T::RequestData {
>> + let request_ptr = self.0.get().cast::<bindings::request>();
>> +
>> + // SAFETY: `request_ptr` is a valid `struct request` because `ARef` is
>> + // `repr(transparent)`
>> + let p: *mut c_void = unsafe { bindings::blk_mq_rq_to_pdu(request_ptr) };
>> +
>> + let p = p.cast::<T::RequestData>();
>> +
>> + // SAFETY: By C API contract, `p` is initialized by a call to
>> + // `OperationsVTable::init_request_callback()`. By existence of `&self`
>> + // it must be valid for use as a shared reference.
>> + unsafe { &*p }
>> + }
>> +}
>> +
>> +// SAFETY: It is impossible to obtain an owned or mutable `Request`, so we can
>
> What do you mean by "mutable `Request`"? There is the function to obtain
> a `&mut Request`.
The idea behind this comment is that it is not possible to have an owned
`Request` instance. You can only ever have something that will deref
(shared) to `Request`. Construction of the `Request` type is not
possible in safe driver code. At least that is the intention.
The `from_ptr_mut` is unsafe, and could be downgraded to
`from_ptr`, since `Operations::complete` takes a shared reference
anyway. Bottom line is that user code does not handle `&mut Request`.
> Also this should probably be an invariant if you use it as a
> justification here.
Ok.
>
>> +// mark it `Send`.
>> +unsafe impl<T: Operations> Send for Request<T> {}
>> +
>> +// SAFETY: `Request` references can be shared across threads.
>
> Does not explain why that is the case.
Will add.
>
>> +unsafe impl<T: Operations> Sync for Request<T> {}
>> +
>> +/// An owned reference to a `Request<T>`
>> +#[repr(transparent)]
>> +pub struct RequestDataRef<T: Operations> {
>> + request: ARef<Request<T>>,
>> +}
>
> Is this extra type really needed? I have not yet taken a look at patch 3,
> which uses this. But would it hurt if you implemented the traits
> there directly on `ARef<Request<T>>`?
Yes, thanks :) Way better with just `ARef<Request<T>>`.
>
>> +
>> +impl<T> RequestDataRef<T>
>> +where
>> + T: Operations,
>> +{
>> + /// Create a new instance.
>> + fn new(request: ARef<Request<T>>) -> Self {
>> + Self { request }
>> + }
>> +
>> + /// Get a reference to the underlying request
>> + pub fn request(&self) -> &Request<T> {
>> + &self.request
>> + }
>> +}
>
> I really like how you improved the safety comments and documentation. It
> was a lot easier to wrap my head around this time (might also me getting
> more experienced, but I think it helped a lot).
That's great!
Best regards,
Andreas