Re: mm: opaque hardware page-table entry handles

From: Pedro Falcato

Date: Thu Jun 25 2026 - 07:08:32 EST


On Thu, Jun 25, 2026 at 11:50:28AM +0100, Muhammad Usama Anjum wrote:
> On 24/06/2026 8:25 pm, Pedro Falcato wrote:
> > On Wed, Jun 24, 2026 at 03:09:08PM +0100, Usama Anjum wrote:
> >> Hi all,
> >>
> >> This is a direction-check with the wider community before spending time on the
> >> development. This picks up the idea that was raised and broadly agreed in the
> >> earlier thread (Ryan Roberts, Lorenzo Stoakes, David Hildenbrand) [1].
> >>
> >> The problem
> >> -----------
> >> Core MM code reaches page-table entries by raw pointer dereference (pte_t *,
> >> pmd_t *, *pud, ...) in places, implicitly assuming a single, uniform
> >> representation. Sprinkling getters wouldn't solve the problem entirely. The
> >> problem is one level up: the *pointer type* itself is overloaded. At each level
> >> there are really three distinct things:
> >>
> >> 1. a page-table entry value (pte_t, pmd_t, ...)
> >> 2. a pointer to an entry value, e.g. a pXX_t on the stack
> >> 3. a pointer to a live entry in the hardware page table
> >>
> >> Today (2) and (3) share the same type - pte_t *, pmd_t *, and so on. Nothing
> >> distinguishes a pointer into a live table from a pointer to a stack copy.
> >>
> >> A pointer to an on-stack entry value and a pointer to a live hardware entry have
> >> the same type, so the compiler cannot distinguish them. Passing the stack
> >> pointer to an arch helper that expects a hardware-entry pointer compiles fine,
> >> but is wrong - a bug class the type system makes invisible. It also blocks
> >> evolution: an arch helper may need to read beyond the addressed entry (e.g.
> >> adjacent or contiguous entries), which only makes sense for a real page-table
> >> pointer, not a stack copy.
> >>
> >> The idea
> >> --------
> >> Give (3) its own opaque type that cannot be dereferenced:
> >>
> >> /* opaque handle to a HW page-table entry; not dereferenceable */
> >> typedef struct {
> >> pte_t *ptr;
> >> } hw_ptep;
> >
> > I don't love typedefs that hide pointers.
> Nobody likes them. This is the only way so that by mistake stack pointers
> don't get reintroduced. Its also hard to catch such cases during review.

That's not true, you could have:

typedef struct { pteval_t pte; } sw_pte_t;

and

/* only usable by arch code and whoever wants to interpret these
* types */
static inline sw_to_ptep(sw_pte_t *swptep)
{
return (pte_t *) swptep;
}

and so on... Also, see Documentation/process/coding-style.rst 5) typedefs, it
explicitly warns against pointer typedefs.

>
> >
> >>
> >> With this:
> >>
> >> - a stack value can no longer masquerade as a hardware table entry,
> >> - a hardware handle can no longer be raw-dereferenced,
> >> - cases that genuinely operate on a value can be refactored to pass the value
> >> and let the caller, which knows whether it holds a handle or a stack copy,
> >> read it once.
> >
> > Just a small passing comment: how about doing it differently? like
> >
> > typedef struct {
> > pte_t *ptep;
> > } sw_ptep_t;
> >
> > or something like that. Were I to guess, referring to a pte_t on the stack
> > is much rarer than all the pte_t references to actual page tables. But maybe
> > reality doesn't match up with my guess :)
> We want to fix the current usages and future usages as well. sw_ptep_t can work
> for current usages, but it'll not force the new code to be written using correct
> notations.

I don't understand what you mean. pte_t is a perfectly correct notation,
it's just currently maybe too ambiguously overloaded.

> Apart from different types, another benefit of hw_pXXp would be that
> it'll become an opaque object which only architecture can manipulate. Hence
> architecture can decide howeverever it wants to manage them in certain cases.

That's already the case. pte_t is fully opaque apart from the little fact
that you can declare one on your stack. Introducing a different sw_pte_t
would further reinforce that. And if you want ways to find raw derefs on
pointers, we can simply slap on __attribute__((noderef)) (available in
sparse and clang) on those types after sw_pte_t is introduced and pte_t
is unambiguously a "hardware" PTE.

I dunno, I'm not convinced that changing around ~450 files is worth it, and
_if_ we want to do something like this I would strongly prefer the way that
is less churny.

--
Pedro