Re: [PATCH 05/79] block: rust: change `queue_rq` request type to `Owned`

From: Andreas Hindborg

Date: Wed Apr 08 2026 - 08:01:23 EST

Alice Ryhl <aliceryhl@xxxxxxxxxx> writes:

> On Mon, Mar 23, 2026 at 1:08 PM Andreas Hindborg <a.hindborg@xxxxxxxxxx> wrote:
>>
>> Alice Ryhl <aliceryhl@xxxxxxxxxx> writes:
>>
>> > On Mon, Feb 16, 2026 at 12:34:52AM +0100, Andreas Hindborg wrote:
>> >> Simplify the reference counting scheme for `Request` from 4 states to 3
>> >> states. This is achieved by coalescing the zero state between block layer
>> >> owned and uniquely owned by driver.
>> >>
>> >> Implement `Ownable` for `Request` and deliver `Request` to drivers as
>> >> `Owned<Request>`. In this process:
>> >>
>> >> - Move uniqueness assertions out of `rnull` as these are now guaranteed by
>> >> the `Owned` type.
>> >> - Move `start_unchecked`, `try_set_end` and `end_ok` from `Request` to
>> >> `Owned<Request>`, relying on type invariant for uniqueness.
>> >>
>> >> Signed-off-by: Andreas Hindborg <a.hindborg@xxxxxxxxxx>
>> >
>> > It would be a lot cleaner if we could implement HrTimerPointer for
>> > Owned<Request> and entirely get rid of the refcount in request so we
>> > don't need ARef<Request> at all.
>> >
>> > Is there a reason we *need* ARef here?
>>
>> There is. Real drivers will need to dma map the data buffers in
>> `Request` to a device. This requires taking a reference on the pages to
>> be mapped, which in turn requires taking a reference on the `Request`.
>>
>> We could split up the reference counts into multiple fields, but that
>> would be less efficient.
>
> So how exactly is the refcount used here? Can you elaborate?

I can try to be more clear.

`Request` objects are created when a driver initializes a device. A
driver initializes a number `Request` equal to the queue depth the
driver supports. That is, if a driver/device supports 16 in-flight
requests, the driver allocates and initializes 16 `Request` objects up
front.

When the kernel wants to issue IO to a block device, it finds an idle
`Request` object and sets it up for the IO operation. Then:

1. The block layer hands off the request to the driver. Ownership of the
request is transferred to the driver. At this point, the driver has a
unique reference to the request (`Owned<Request>`).

2. The driver DMA maps the pages of the request. We have to make sure
the request is not handed back to the block layer while the pages are
mapped. To this end, we take a refcount on the request. The reference
that the driver holds is no longer unique (`ARef<Request>`).

3. The driver instructs a device to carry out the request. The driver
releases its refcount on the request and the device takes a refcount.

4. When the device finishes processing the request, ownership is
transferred back to the driver. The device releases it's refcount and
the driver takes a refcount.

5. DMA mappings are torn down. The refcount associated with the DMA
mappings is released.

6. The driver transfers ownership of the request back to the block
layer. To do this, the request must be uniquely owned by the driver.

When the device is done processing a request and we have to transfer
ownership of the request back to the driver, we use an API function
called `tag_to_rq`. This function takes an integer tag and may return a
request reference. For this function to be safe, we have to be able to
assert that the integer tag passed to the function is naming a request
object that it is valid to obtain a reference to. It is not sound to
create references to requests that are not currently in flight. Thus, we
must be able to know this information. The current implementation relies
on the refcount to discover this information:

/// There are three states for a request that the Rust bindings care about:
///
/// - 0: The request is owned by C block layer or is uniquely referenced (by [`Owned<_>`]).
/// - 1: The request is owned by Rust abstractions but is not referenced.
/// - 2+: There is one or more [`ARef`] instances referencing the request.

So, we are using 1 refcount field to encode all the information we need.

Further, in the current implementation, for step 3, the device does not
actually take a refcount on the request. If a driver drops all
references to a request, the refcount lands on 1. We use this to
indicate that the request has been leaked and to know that `tag_to_rq`
is safe. In a situation where the request is DMA mapped, the refcount
would be 2 while the device is processing the request.

Here is a sequence diagram of the flow:

Block Layer Driver Device
─────────── ────── ──────
| | |
| | |
(1) | hand off request | |
|-------------------->| |
| | Owned<Request> |
| | refcount = 0 |
| | |
| | |
(2) | DMA map pages |
| |---------. |
| | into_shared() |
| | ARef<Request> |
| | refcount = 2 |
| | map_pages() |
| | refcount = 3 |
| | |
| | |
(3) | submit to device |
| |--------------------->|
| | (drop driver ref) |
| | refcount = 2 |
| | (DMA ref remains) |
| | |
| | |
| | .----------------. |
| | | Device does | |
| | | DMA to/from | |
| | | mapped pages | |
| | '----------------' |
| | |
| | |
(4) | completion IRQ |
| |<---------------------|
| | refcount = 2 |
| | tag_to_rq(tag) |
| | ARef<Request> |
| | refcount = 3 |
| | |
| | |
(5) | tear down DMA mappings |
| |---------. |
| | (drop DMA ref) |
| | refcount = 2 |
| | |
| | |
(6) | hand back request | |
|<--------------------| |
| | try_from_shared() |
| | cmpxchg(2 -> 0) |
| | Owned<Request> |
| | refcount = 0 |
| | end_ok() |
| | |

We might be able to get by without shared references (`ARef<Request>`)
and only use an owned reference (`Owned<Request>`) if we add additional
fields to the `Reqeuest` structure. We need to track if the request is
in a state where we can return an `Owned<Requst>` from `tag_to_rq` , and
we need to track if any of the pages of the request are mapped for DMA,
so that we can end the request.

I am not convinced that having the additional fields is worth it for
simplifying the reference counting scheme.

> With regards to the Owned series, I still think we should split it up
> so that the patches making ARef+Owned work like Arc/UniqueArc is
> separate follow-up series.

I'll take that into consideration when sending the next spin of that series.

Best regards,
Andreas Hindborg