Re: [PATCH v10 07/21] gpu: nova-core: mm: Add TLB flush support

From: Eliot Courtney

Date: Thu Apr 02 2026 - 01:49:26 EST

On Wed Apr 1, 2026 at 6:20 AM JST, Joel Fernandes wrote:
> Add TLB (Translation Lookaside Buffer) flush support for GPU MMU.
>
> After modifying page table entries, the GPU's TLB must be invalidated
> to ensure the new mappings take effect. The Tlb struct provides flush
> functionality through BAR0 registers.
>
> The flush operation writes the page directory base address and triggers
> an invalidation, polling for completion with a 2 second timeout matching
> the Nouveau driver.
>
> Cc: Nikola Djukic <ndjukic@xxxxxxxxxx>
> Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
> ---
> drivers/gpu/nova-core/mm.rs | 1 +
> drivers/gpu/nova-core/mm/tlb.rs | 95 +++++++++++++++++++++++++++++++++
> drivers/gpu/nova-core/regs.rs | 42 +++++++++++++++
> 3 files changed, 138 insertions(+)
> create mode 100644 drivers/gpu/nova-core/mm/tlb.rs
>
> diff --git a/drivers/gpu/nova-core/mm.rs b/drivers/gpu/nova-core/mm.rs
> index 8f3089a5fa88..cfe9cbe11d57 100644
> --- a/drivers/gpu/nova-core/mm.rs
> +++ b/drivers/gpu/nova-core/mm.rs
> @@ -5,6 +5,7 @@
> #![expect(dead_code)]
>
> pub(crate) mod pramin;
> +pub(crate) mod tlb;
>
> use kernel::sizes::SZ_4K;
>
> diff --git a/drivers/gpu/nova-core/mm/tlb.rs b/drivers/gpu/nova-core/mm/tlb.rs
> new file mode 100644
> index 000000000000..cd3cbcf4c739
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/tlb.rs
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! TLB (Translation Lookaside Buffer) flush support for GPU MMU.
> +//!
> +//! After modifying page table entries, the GPU's TLB must be flushed to
> +//! ensure the new mappings take effect. This module provides TLB flush
> +//! functionality for virtual memory managers.
> +//!
> +//! # Example
> +//!
> +//! ```ignore
> +//! use crate::mm::tlb::Tlb;
> +//!
> +//! fn page_table_update(tlb: &Tlb, pdb_addr: VramAddress) -> Result<()> {
> +//! // ... modify page tables ...
> +//!
> +//! // Flush TLB to make changes visible (polls for completion).
> +//! tlb.flush(pdb_addr)?;
> +//!
> +//! Ok(())
> +//! }
> +//! ```
> +
> +use kernel::{
> + devres::Devres,
> + io::poll::read_poll_timeout,
> + io::Io,
> + new_mutex,
> + prelude::*,
> + sync::{
> + Arc,
> + Mutex, //
> + },
> + time::Delta, //
> +};
> +
> +use crate::{
> + driver::Bar0,
> + mm::VramAddress,
> + regs, //
> +};
> +
> +/// TLB manager for GPU translation buffer operations.
> +#[pin_data]
> +pub(crate) struct Tlb {
> + bar: Arc<Devres<Bar0>>,
> + /// TLB flush serialization lock: This lock is acquired during the
> + /// DMA fence signalling critical path. It must NEVER be held across any
> + /// reclaimable CPU memory allocations because the memory reclaim path can
> + /// call `dma_fence_wait()`, which would deadlock with this lock held.
> + #[pin]

This comment says that the lock is acquired during the DMA fence
signalling critical path, but IIUC we don't have anything like that
right now. Is this based on future yet to be done work? Can we reword
this in a way so it makes sense in the current state?

> + lock: Mutex<()>,
> +}
> +
> +impl Tlb {
> + /// Create a new TLB manager.
> + pub(super) fn new(bar: Arc<Devres<Bar0>>) -> impl PinInit<Self> {
> + pin_init!(Self {
> + bar,
> + lock <- new_mutex!((), "tlb_flush"),
> + })
> + }
> +
> + /// Flush the GPU TLB for a specific page directory base.
> + ///
> + /// This invalidates all TLB entries associated with the given PDB address.
> + /// Must be called after modifying page table entries to ensure the GPU sees
> + /// the updated mappings.

If this must be called after every operation like that, I wonder if we
can change the design to require a guard like pattern something to
ensure flush is called. WDYT?

> + pub(crate) fn flush(&self, pdb_addr: VramAddress) -> Result {

Hopefully we don't need to be calling flush() from anywhere in the
entire crate. Can you tighten the visibility here and in other places?
Many things seem to be pub(crate) that don't need to be.

> + let _guard = self.lock.lock();
> +
> + let bar = self.bar.try_access().ok_or(ENODEV)?;
> +
> + // Write PDB address.
> + bar.write_reg(regs::NV_TLB_FLUSH_PDB_LO::from_pdb_addr(pdb_addr.raw_u64()));
> + bar.write_reg(regs::NV_TLB_FLUSH_PDB_HI::from_pdb_addr(pdb_addr.raw_u64()));
> +
> + // Trigger flush: invalidate all pages and enable.
> + bar.write_reg(
> + regs::NV_TLB_FLUSH_CTRL::zeroed()
> + .with_page_all(true)
> + .with_enable(true),
> + );
> +
> + // Poll for completion - enable bit clears when flush is done.
> + read_poll_timeout(
> + || Ok(bar.read(regs::NV_TLB_FLUSH_CTRL)),
> + |ctrl: &regs::NV_TLB_FLUSH_CTRL| !ctrl.enable(),
> + Delta::ZERO,
> + Delta::from_secs(2),
> + )?;

This has zero delay on the read_poll_timeout - what about adding a small
delay of a microsecond or so?