Re: [PATCH v6 4/4] rust: add abstraction for `struct page`
From: Boqun Feng
Date: Thu Apr 18 2024 - 14:53:36 EST
On Thu, Apr 18, 2024 at 08:59:20AM +0000, Alice Ryhl wrote:
> Adds a new struct called `Page` that wraps a pointer to `struct page`.
> This struct is assumed to hold ownership over the page, so that Rust
> code can allocate and manage pages directly.
>
> The page type has various methods for reading and writing into the page.
> These methods will temporarily map the page to allow the operation. All
> of these methods use a helper that takes an offset and length, performs
> bounds checks, and returns a pointer to the given offset in the page.
>
> This patch only adds support for pages of order zero, as that is all
> Rust Binder needs. However, it is written to make it easy to add support
> for higher-order pages in the future. To do that, you would add a const
> generic parameter to `Page` that specifies the order. Most of the
> methods do not need to be adjusted, as the logic for dealing with
> mapping multiple pages at once can be isolated to just the
> `with_pointer_into_page` method.
>
Thank you for doing this, and breaking the chicken-and-egg problem chain
;-) For sure, the whole package of page API would need more time to
design, implement and review, but this patch looks good enough to me.
> Rust Binder needs to manage pages directly as that is how transactions
> are delivered: Each process has an mmap'd region for incoming
> transactions. When an incoming transaction arrives, the Binder driver
> will choose a region in the mmap, allocate and map the relevant pages
> manually, and copy the incoming transaction directly into the page. This
> architecture allows the driver to copy transactions directly from the
> address space of one process to another, without an intermediate copy
> to a kernel buffer.
>
> This code is based on Wedson's page abstractions from the old rust
> branch, but it has been modified by Alice by removing the incomplete
> support for higher-order pages, by introducing the `with_*` helpers
> to consolidate the bounds checking logic into a single place, and
> various other changes.
>
> Co-developed-by: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> Signed-off-by: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> Reviewed-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
> Reviewed-by: Trevor Gross <tmgross@xxxxxxxxx>
> Reviewed-by: Benno Lossin <benno.lossin@xxxxxxxxx>
> Signed-off-by: Alice Ryhl <aliceryhl@xxxxxxxxxx>
Reviewed-by: Boqun Feng <boqun.feng@xxxxxxxxx>
Something I want to bring up for discussion below:
[...]
> + /// Runs a piece of code with a raw pointer to a slice of this page, with bounds checking.
> + ///
> + /// If `f` is called, then it will be called with a pointer that points at `off` bytes into the
> + /// page, and the pointer will be valid for at least `len` bytes. The pointer is only valid on
> + /// this task, as this method uses a local mapping.
> + ///
> + /// If `off` and `len` refers to a region outside of this page, then this method returns
> + /// `EINVAL` and does not call `f`.
> + ///
> + /// # Using the raw pointer
> + ///
> + /// It is up to the caller to use the provided raw pointer correctly. The pointer is valid for
> + /// `len` bytes and for the duration in which the closure is called. The pointer might only be
> + /// mapped on the current thread, and when that is the case, dereferencing it on other threads
> + /// is UB. Other than that, the usual rules for dereferencing a raw pointer apply: don't cause
> + /// data races, the memory may be uninitialized, and so on.
> + ///
> + /// If multiple threads map the same page at the same time, then they may reference with
> + /// different addresses. However, even if the addresses are different, the underlying memory is
> + /// still the same for these purposes (e.g., it's still a data race if they both write to the
> + /// same underlying byte at the same time).
> + fn with_pointer_into_page<T>(
> + &self,
> + off: usize,
> + len: usize,
> + f: impl FnOnce(*mut u8) -> Result<T>,
I wonder whether the way to go here is making this function signature:
fn with_slice_in_page<T> (
&self,
off: usize,
len: usize,
f: iml FnOnce(&UnsafeCell<[u8]>) -> Result<T>
) -> Result<T>
, because in this way, it makes a bit more clear that what memory that
`f` can access, in other words, the users are less likely to use the
pointer in a wrong way.
But that depends on whether `&UnsafeCell<[u8]>` is the correct
abstraction and the ecosystem around it: for example, I feel like these
two functions:
fn len(slice: &UnsafeCell<[u8]>) -> usize
fn as_ptr(slice: &UnsafeCell<[u8]>) -> *mut u8
should be trivially safe, but I might be wrong. Again this is just for
future discussion.
Regards,
Boqun
> + ) -> Result<T> {
> + let bounds_ok = off <= PAGE_SIZE && len <= PAGE_SIZE && (off + len) <= PAGE_SIZE;
> +
> + if bounds_ok {
> + self.with_page_mapped(move |page_addr| {
> + // SAFETY: The `off` integer is at most `PAGE_SIZE`, so this pointer offset will
> + // result in a pointer that is in bounds or one off the end of the page.
> + f(unsafe { page_addr.add(off) })
> + })
> + } else {
> + Err(EINVAL)
> + }
> + }
> +
[...]
>
> --
> 2.44.0.683.g7961c838ac-goog
>