Re: [PATCH 4/7] gpu: nova-core: falcon: use dma::Coherent

From: Gary Guo

Date: Thu Mar 26 2026 - 11:59:38 EST

On Thu Mar 26, 2026 at 3:04 PM GMT, Alexandre Courbot wrote:
> On Wed Mar 25, 2026 at 11:14 AM JST, Eliot Courtney wrote:
>> On Sat Mar 21, 2026 at 10:36 PM JST, Alexandre Courbot wrote:
>>> Replace the nova-core local `DmaObject` with a `Coherent` that can
>>> fulfill the same role.
>>>
>>> Signed-off-by: Alexandre Courbot <acourbot@xxxxxxxxxx>
>>> ---
>>> drivers/gpu/nova-core/falcon.rs | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/falcon.rs b/drivers/gpu/nova-core/falcon.rs
>>> index 5bf8da8760bf..f6239c44dd80 100644
>>> --- a/drivers/gpu/nova-core/falcon.rs
>>> +++ b/drivers/gpu/nova-core/falcon.rs
>>> @@ -10,6 +10,7 @@
>>> Device, //
>>> },
>>> dma::{
>>> + Coherent,
>>> DmaAddress,
>>> DmaMask, //
>>> },
>>> @@ -20,7 +21,6 @@
>>> };
>>>
>>> use crate::{
>>> - dma::DmaObject,
>>> driver::Bar0,
>>> falcon::hal::LoadMethod,
>>> gpu::Chipset,
>>> @@ -636,7 +636,7 @@ pub(crate) fn pio_load<F: FalconFirmware<Target = E> + FalconPioLoadable>(
>>> fn dma_wr(
>>> &self,
>>> bar: &Bar0,
>>> - dma_obj: &DmaObject,
>>> + dma_obj: &Coherent<[u8]>,
>>> target_mem: FalconMem,
>>> load_offsets: FalconDmaLoadTarget,
>>> ) -> Result {
>>> @@ -740,7 +740,7 @@ fn dma_load<F: FalconFirmware<Target = E> + FalconDmaLoadable>(
>>> fw: &F,
>>> ) -> Result {
>>> // Create DMA object with firmware content as the source of the DMA engine.
>>> - let dma_obj = DmaObject::from_data(dev, fw.as_slice())?;
>>> + let dma_obj = Coherent::from_slice(dev, fw.as_slice(), GFP_KERNEL)?;
>>
>> Is it guaranteed that fw.as_slice() is a multiple of 256 in size?
>> In `dma_wr` it breaks this up into 256 byte transfers. Since this
>> no longer pads out to a page boundary, it means that it could now error
>> (around "DMA transfer goes beyond range of DMA object") if the Dmem
>> section's size is not divisible by 256. But tbh, I find it odd that
>> `dma_wr` doesn't check that FalconDmaLoadTarget's length is a
>> multiple of 256 anyway, because it looks like it'll write a bunch of
>> unrelated bytes (since it rounds up to the nearest 256 to copy).
>>
>> Maybe we should enforce that `FalconDmaLoadTarget` length is divisible
>> by 256?
>>
>> For this series if for all firmwares it's divisible by 256 then I think
>> it's fine to leave this as is for now, but I do find the lack of
>> checking in `dma_wr` (or anywhere else for FalconDmaLoadTarget) a bit
>> odd.
>
> All coherent allocations are page-aligned (and use full pages), so we
> are safe in terms of overflows.

Let's not rely on this behaviour. There is no guarantee on what's at the end
of allocation whatsoever. There's no guarantee that it will be initialized.
Even with __GFP_ZERO only the size provided will be zeroed.

If the GPU is going to read beyond ranges covered by `Coherent` (not just rely
on the alignment), let's align up the allocation.

>
> Also `dma_wr` uses `div_ceil(256)` which will skip the last data block
> entirely if it is not a multiple of 256. It might be a bit more robust
> to explicitly check that the size is a multiple of 256 and return an
> error if that is not the case indeed.

div_ceil will not skip the last block, it will over-read beyond the end.
div_floor would have skipped the block.

Best,
Gary