Re: [PATCH 00/13] [RFC] Rust support

From: Wedson Almeida Filho
Date: Thu Apr 22 2021 - 11:24:13 EST

Next message: Arnd Bergmann: "Re: [PATCH] staging: rtl8723bs: avoid bogus gcc warning"
Previous message: Jan Beulich: "Re: [PATCH 0/3] xen: remove some checks for always present Xen features"
In reply to: David Laight: "RE: [PATCH 00/13] [RFC] Rust support"
Next in thread: Miguel Ojeda: "Re: [PATCH 00/13] [RFC] Rust support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

First of all, thanks for the thoughtful feedback.

On Thu, Apr 22, 2021 at 12:03:13PM +0200, Linus Walleij wrote:
> I am sure you are aware of this document:
> Documentation/process/stable-api-nonsense.rst
>
> We really like to change the internal APIs of the kernel, and it sounds to
> me like Rust really likes a rust-side-vs-C-side approach to APIs, requiring
> these wrappers to be written and maintained all over the place, and that
> is going to affect the mobility of the kernel-internal APIs and make them
> less mobile.

The Rust-side-vs-C-side is because we want to provide an environment where we
can write kernel code (e.g., a driver) and if we stick to safe code (i.e.,
we don't use the Rust keyword "unsafe"), then we can be confident that our
code is free of memory safety issues (assuming, of course, that the abstractions
are sound).

Calling C functions directly would not allow us to do this.

As for mobility, I have the impression that this could potentially increase
mobility in that for Rust maintainers would need to change one place (the
wrapper) as opposed to a number of drivers using an API (if it's mostly an
argument/change kind of thing). Of course, if it's something like removing an
API then we'd have to change everywhere.

I'd like to reassure you that it is not our intention to create a stable api,
restrict mobility, or anything of that sort. Though I do acknowledge Rust may
complicate things (more on this below).

> If it means I need to write and review less patches related to NULL
> dereference and use-after-free etc etc, then it may very well be worth
> it.

Indeed that's part of our goal. A class of vulnerabilities is removed by
construction; others are harder to create accidentally. Reviewers would also
know that unsafe blocks need extra attention.

> But as subsystem maintainer I'd like a clear picture of this wrapper
> overhead, what does it usually entail? A typical kernel API has
> vtable and a few variables, not much more than that.
>
> I go to patch 12/13 and I see things like this:
>
> +/// A descriptor of wrapped list elements.
> +pub trait GetLinksWrapped: GetLinks {
> + /// Specifies which wrapper (e.g., `Box` and `Arc`) wraps the list entries.
> + type Wrapped: Wrapper<Self::EntryType>;
> +}
> +
> +impl<T: ?Sized> GetLinksWrapped for Box<T>
> +where
> + Box<T>: GetLinks,
> +{
> + type Wrapped = Box<<Box<T> as GetLinks>::EntryType>;
> +}
> +
> +impl<T: GetLinks + ?Sized> GetLinks for Box<T> {
> + type EntryType = T::EntryType;
> + fn get_links(data: &Self::EntryType) -> &Links<Self::EntryType> {
> + <T as GetLinks>::get_links(data)
> + }
> +}

We want the runtime overhead to be zero. During development, as you rightly
point out, there is the overhead of creating and maintaining these abstractions
for use in Rust. The code above is not a good example of a wrapper because it's
not wrapping kernel C functionality.

A better example is Pages, which wraps a pointer to struct page:

pub struct Pages<const ORDER: u32> {
pages: *mut bindings::page,
}

If you call Pages::new(), alloc_pages() is called and returns a
KernelResult<Pages>. If the allocation fails you get an error back, otherwise
you get the pages: there is no possibility of forgetting to check the return
value and accidentally dereferencing a NULL pointer.

We have ORDER as a compile-time argument to the type, so we know at compile-time
how many pages we have at no additional runtime cost. So, for example, when we
have to free the pages, the destructor knows what the right argument is when
calling free_pages.

The fact that you have a destructor also ensures that you don't accidentally
forget to free the pages, so no leaks. (We don't have it implemented because
we haven't needed it yet, but we can have get_page/put_page with proper
ownership, i.e., after the equivalent of put_page, you can no longer touch the
page, enforced at compile time).

We provide an `insert_page` associated function that maps the given page to a
vma by calling vm_insert_page. (Only works if ORDER is zero.)

We provide a `kmap` associated function that maps one of the pages and returns a
mapping, which itself has a wrapper type that ensures that kunmap is called when
it goes out of scope.

Anyway, what I'm trying to show here is that the wrappers are quite thin and are
intended to enforce safety (where possible) and correct usage. Does it make
sense? I'm glad to go into more details if desired.

> All subsystem maintainers are expected to understand and maintain
> wrappers like these, right? That means all subsystem maintainers need
> to be elevated to understand the above without effort if you wake them
> up in their sleep at 4 in the morning.

I suppose they'd need to understand the wrappers that I talked about above, not
the ones you copied (those are wrapping something else and maintainers of other
subsystems are not expected to write this sort of code).

There are other possible approaches too:
1. Explicitly exclude Rust support from certain subsystems, say, no Rust USB
drivers (just a random example).
2. Maintainers may choose to not care about Rust, breaking it on api changes.

Naturally I'd prefer Rust support to be a first-class citizen but I mention the
above for completeness.

> Get me right, we are of course good at doing really complicated stuff,
> that's what engineers do. But we are not Iron Man. We need a clear
> way into understanding and maintaining wrappers and we need support
> with it when we don't understand it, so the kernel would need a Rust
> wrapper maintainer that we can trust to stay around for the long term,
> i.e. until their retirement, while actively teaching others for decades.
> For an example see how RCU is maintained.

Agreed. The only part that I'm not sure about is whether we need to put all the
burden on a single person for the rest of their career. In the beginning, of
course, but over time I would expect (hope?) experts would emerge and some of
the load would be distributed.

Cheers,
-Wedson

Next message: Arnd Bergmann: "Re: [PATCH] staging: rtl8723bs: avoid bogus gcc warning"
Previous message: Jan Beulich: "Re: [PATCH 0/3] xen: remove some checks for always present Xen features"
In reply to: David Laight: "RE: [PATCH 00/13] [RFC] Rust support"
Next in thread: Miguel Ojeda: "Re: [PATCH 00/13] [RFC] Rust support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]