Re: [PATCH v10 07/21] gpu: nova-core: mm: Add TLB flush support

From: Alexandre Courbot

Date: Wed Apr 08 2026 - 03:44:07 EST

On Tue Apr 7, 2026 at 2:14 PM JST, Matthew Brost wrote:
> On Mon, Apr 06, 2026 at 06:10:07PM -0400, Joel Fernandes wrote:
>>
>>
>> On 4/6/2026 5:24 PM, Joel Fernandes wrote:
>> >
>> >
>> > On 4/2/2026 1:59 AM, Matthew Brost wrote:
>> >> On Tue, Mar 31, 2026 at 05:20:34PM -0400, Joel Fernandes wrote:
>> >>> Add TLB (Translation Lookaside Buffer) flush support for GPU MMU.
>> >>>
>> >>> After modifying page table entries, the GPU's TLB must be invalidated
>> >>> to ensure the new mappings take effect. The Tlb struct provides flush
>> >>> functionality through BAR0 registers.
>> >>>
>> >>> The flush operation writes the page directory base address and triggers
>> >>> an invalidation, polling for completion with a 2 second timeout matching
>> >>> the Nouveau driver.
>> >>>
>> >>> Cc: Nikola Djukic <ndjukic@xxxxxxxxxx>
>> >>> Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
>> >>> ---
>> >>> drivers/gpu/nova-core/mm.rs | 1 +
>> >>> drivers/gpu/nova-core/mm/tlb.rs | 95 +++++++++++++++++++++++++++++++++
>> >>> drivers/gpu/nova-core/regs.rs | 42 +++++++++++++++
>> >>> 3 files changed, 138 insertions(+)
>> >>> create mode 100644 drivers/gpu/nova-core/mm/tlb.rs
>> >>>
>> >>> diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
>> >>> index 8f3089a5fa88..cfe9cbe11d57 100644
>> >>> --- a/drivers/gpu/nova-core/mm.rs
>> >>> +++ b/drivers/gpu/nova-core/mm.rs
>> >>> @@ -5,6 +5,7 @@
>> >>> #![expect(dead_code)]
>> >>>
>> >>> pub(crate) mod pramin;
>> >>> +pub(crate) mod tlb;
>> >>>
>> >>> use kernel::sizes::SZ_4K;
>> >>>
>> >>> diff --git a/drivers/gpu/nova-core/mm/tlb.rs b/drivers/gpu/nova-core/mm/tlb.rs
>> >>> new file mode 100644
>> >>> index 000000000000..cd3cbcf4c739
>> >>> --- /dev/null
>> >>> +++ b/drivers/gpu/nova-core/mm/tlb.rs
>> >>> @@ -0,0 +1,95 @@
>> >>> +// SPDX-License-Identifier: GPL-2.0
>> >>> +
>> >>> +//! TLB (Translation Lookaside Buffer) flush support for GPU MMU.
>> >>> +//!
>> >>> +//! After modifying page table entries, the GPU's TLB must be flushed to
>> >>> +//! ensure the new mappings take effect. This module provides TLB flush
>> >>> +//! functionality for virtual memory managers.
>> >>> +//!
>> >>> +//! # Example
>> >>> +//!
>> >>> +//! ```ignore
>> >>> +//! use crate::mm::tlb::Tlb;
>> >>> +//!
>> >>> +//! fn page_table_update(tlb: &Tlb, pdb_addr: VramAddress) -> Result<()> {
>> >>> +//! // ... modify page tables ...
>> >>> +//!
>> >>> +//! // Flush TLB to make changes visible (polls for completion).
>> >>> +//! tlb.flush(pdb_addr)?;
>> >>> +//!
>> >>> +//! Ok(())
>> >>> +//! }
>> >>> +//! ```
>> >>> +
>> >>> +use kernel::{
>> >>> + devres::Devres,
>> >>> + io::poll::read_poll_timeout,
>> >>> + io::Io,
>> >>> + new_mutex,
>> >>> + prelude::*,
>> >>> + sync::{
>> >>> + Arc,
>> >>> + Mutex, //
>> >>> + },
>> >>> + time::Delta, //
>> >>> +};
>> >>> +
>> >>> +use crate::{
>> >>> + driver::Bar0,
>> >>> + mm::VramAddress,
>> >>> + regs, //
>> >>> +};
>> >>> +
>> >>> +/// TLB manager for GPU translation buffer operations.
>> >>> +#[pin_data]
>> >>> +pub(crate) struct Tlb {
>> >>> + bar: Arc<Devres<Bar0>>,
>> >>> + /// TLB flush serialization lock: This lock is acquired during the
>> >>> + /// DMA fence signalling critical path. It must NEVER be held across any
>> >>> + /// reclaimable CPU memory allocations because the memory reclaim path can
>> >>> + /// call `dma_fence_wait()`, which would deadlock with this lock held.
>> >>> + #[pin]
>> >>> + lock: Mutex<()>,
>> >>> +}
>> >>> +
>> >>> +impl Tlb {
>> >>> + /// Create a new TLB manager.
>> >>> + pub(super) fn new(bar: Arc<Devres<Bar0>>) -> impl PinInit<Self> {
>> >>> + pin_init!(Self {
>> >>> + bar,
>> >>> + lock <- new_mutex!((), "tlb_flush"),
>> >>> + })
>> >>> + }
>> >>> +
>> >>> + /// Flush the GPU TLB for a specific page directory base.
>> >>> + ///
>> >>> + /// This invalidates all TLB entries associated with the given PDB address.
>> >>> + /// Must be called after modifying page table entries to ensure the GPU sees
>> >>> + /// the updated mappings.
>> >>> + pub(crate) fn flush(&self, pdb_addr: VramAddress) -> Result {
>> >>
>> >> This landed on my list randomly, so I took a look.
>> >>
>> >> Wouldn’t you want to virtualize the invalidation based on your device?
>> >> For example, what if you need to register interface changes on future hardware?
>> >
>> > Good point, for future hardware it indeed makes sense. I will do that.
>> Actually, at least in the future as far as I can see, the register definitions
>> are the same for TLB invalidation are the same, so we are good and I will not be
>> making any change in this regard.
>>
>> But, thanks for raising the point and forcing me to double check!
>>
>
> Not my driver, but this looks like a classic “works now” change that may
> not hold up later, which is why I replied to something that isn’t really
> my business.
>
> Again, not my area, but I’ve been through this before. Generally,
> getting the abstractions right up front pays off.

Our policy so far has been to introduce abstractions when there is a
justifiable hardware difference within the (reduced) set of GPUs that we
support. Adding HALs imply using virtual calls, which are harder to
trace, or monomorphization using generic types, which complicate the
code (and are hard to justify when there is only one implementation).

`Tlb` is already an abstraction of sorts; the register accesses are
well-contained within this leaf module, so if the need arises to support
a different scheme this can be done transparently to callers, which is
what matter imho.