Allow data races on some read/write operations

From: Andreas Hindborg
Date: Mon Mar 03 2025 - 14:00:27 EST



[New subject, was: Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction]

"Alice Ryhl" <aliceryhl@xxxxxxxxxx> writes:

> On Mon, Mar 3, 2025 at 4:21 PM Andreas Hindborg <a.hindborg@xxxxxxxxxx> wrote:
>>
>> "Alice Ryhl" <aliceryhl@xxxxxxxxxx> writes:
>>
>> > On Mon, Mar 3, 2025 at 2:00 PM Andreas Hindborg <a.hindborg@xxxxxxxxxx> wrote:
>> >>
>> >> "Daniel Almeida" <daniel.almeida@xxxxxxxxxxxxx> writes:
>> >>
>> >> > Hi Benno,
>> >> >
>> >>
>> >> [...]
>> >>
>> >> >>> + /// Writes data to the region starting from `offset`. `offset` is in units of `T`, not the
>> >> >>> + /// number of bytes.
>> >> >>> + ///
>> >> >>> + /// # Examples
>> >> >>> + ///
>> >> >>> + /// ```
>> >> >>> + /// # fn test(alloc: &mut kernel::dma::CoherentAllocation<u8>) -> Result {
>> >> >>> + /// let somedata: [u8; 4] = [0xf; 4];
>> >> >>> + /// let buf: &[u8] = &somedata;
>> >> >>> + /// alloc.write(buf, 0)?;
>> >> >>> + /// # Ok::<(), Error>(()) }
>> >> >>> + /// ```
>> >> >>> + pub fn write(&self, src: &[T], offset: usize) -> Result {
>> >> >>> + let end = offset.checked_add(src.len()).ok_or(EOVERFLOW)?;
>> >> >>> + if end >= self.count {
>> >> >>> + return Err(EINVAL);
>> >> >>> + }
>> >> >>> + // SAFETY:
>> >> >>> + // - The pointer is valid due to type invariant on `CoherentAllocation`
>> >> >>> + // and we've just checked that the range and index is within bounds.
>> >> >>> + // - `offset` can't overflow since it is smaller than `selfcount` and we've checked
>> >> >>> + // that `self.count` won't overflow early in the constructor.
>> >> >>> + unsafe {
>> >> >>> + core::ptr::copy_nonoverlapping(src.as_ptr(), self.cpu_addr.add(offset), src.len())
>> >> >>
>> >> >> Why are there no concurrent write or read operations on `cpu_addr`?
>> >> >
>> >> > Sorry, can you rephrase this question?
>> >>
>> >> This write is suffering the same complications as discussed here [1].
>> >> There are multiple issues with this implementation.
>> >>
>> >> 1) `write` takes a shared reference and thus may be called concurrently.
>> >> There is no synchronization, so `copy_nonoverlapping` could be called
>> >> concurrently on the same address. The safety requirements for
>> >> `copy_nonoverlapping` state that the destination must be valid for
>> >> write. Alice claims in [1] that any memory area that experience data
>> >> races are not valid for writes. So the safety requirement of
>> >> `copy_nonoverlapping` is violated and this call is potential UB.
>> >>
>> >> 2) The destination of this write is DMA memory. It could be concurrently
>> >> modified by hardware, leading to the same issues as 1). Thus the
>> >> function cannot be safe if we cannot guarantee hardware will not write
>> >> to the region while this function is executing.
>> >>
>> >> Now, I don't think that these _should_ be issues, but according to our
>> >> Rust language experts they _are_.
>> >>
>> >> I really think that copying data through a raw pointer to or from a
>> >> place that experiences data races, should _not_ be UB if the data is not
>> >> interpreted in any way, other than moving it.
>> >>
>> >>
>> >> Best regards,
>> >> Andreas Hindborg
>> >
>> > We need to make progress on this series, and it's starting to get late
>> > in the cycle. I suggest we:
>>
>> There is always another cycle.
>>
>> >
>> > 1. Delete as_slice, as_slice_mut, write, and skip_drop.
>> > 2. Change field_read/field_write to use a volatile read/write.
>>
>> Volatile reads/writes that race are OK?
>
> I will not give a blanket yes to that. If you read their docs, you
> will find that they claim to not allow it. But they are the correct
> choice for DMA memory, and there's no way in practice to get
> miscompilations on memory locations that are only accessed with
> volatile operations, and never have references to them created.
>
> In general, this will fall into the exception that we've been given
> from the Rust people. In cases such as this where the Rust language
> does not give us the operation we want, do it like you do in C. Since
> Rust uses LLVM which does not miscompile the C part of the kernel, it
> should not miscompile the Rust part either.

This exception we got for `core::ptr::{read,write}_volatile`, did we
document that somewhere?

I feel slightly lost when trying to figure out what fits under this
exception and what is UB. I think that fist step to making this more
straight forward is having clear documentation.

For cases where we need to do the equivalent of `memmove`/`memcpy`, what
are is our options?

In case we have no options, do you know who would be the right people on
the Rust Project side to contact about getting an exception for this
case?


Best regards,
Andreas Hindborg