Re: [PATCH v3 06/22] mm: Always use page table accessor functions

From: David Hildenbrand (Red Hat)

Date: Thu Nov 27 2025 - 02:32:00 EST

On 11/27/25 08:14, David Hildenbrand (Red Hat) wrote:

On 11/26/25 21:31, David Hildenbrand (Red Hat) wrote:

On 11/26/25 17:34, Ryan Roberts wrote:

On 26/11/2025 16:07, Ryan Roberts wrote:

On 26/11/2025 15:12, David Hildenbrand (Red Hat) wrote:

On 11/26/25 16:08, Lorenzo Stoakes wrote:

On Wed, Nov 26, 2025 at 03:56:13PM +0100, David Hildenbrand (Red Hat) wrote:

On 11/26/25 15:52, Lorenzo Stoakes wrote:

Would the pmdp_get() never get invoked then? Or otherwise wouldn't that end up
requiring a READ_ONCE() further up the stack?

See my other reply, I think the pmdp_get() is required because all pud_*
functions are just simple stubs.

OK, thought you were saying we should push further down the stack? Or up
depending on how you view these things :P as in READ_ONCE at leaf?

I think at leaf because I think the previous ones should essentially be only
used by stubs.

But I haven't fully digested how this is all working. Or supposed to work.

I'm trying to chew through the arch/arm/include/asm/pgtable-2level.h example to
see if I can make sense of it,

I wonder if we can think about this slightly differently;

READ_ONCE() has two important properties:

- It guarrantees that a load will be issued, *even if output is unused*
- It guarrantees that the read will be single-copy-atomic (no tearing)

I think for the existing places where READ_ONCE() is used for pagetable reads we
only care about:

- It guarrantees that a load will be issued, *if output is used*
- It guarrantees that the read will be single-copy-atomic (no tearing)

I think if we can weaken to the "if output is used" property, then the compiler
will optimize out all the unneccessary reads.

AIUI, a C dereference provides neither of the guarrantees so that's no good.

What about non-volatile asm? I'm told (thought need to verify) that for
non-volatile asm, the compiler will emit it if the output is used and remove it
otherwise. So if the asm contains the required single-copy-atomic, perhaps we
are in business?

So we would need a new READ_SCA() macro that could default to READ_ONCE() (which
is stronger) and arches could opt in to providing a weaker asm version. Then the
default pXdp_get() could be READ_SCA(). And this should work for all cases.

I think.

I'm not sure this works. It looks like the compiler is free to move non-volatile
asm sections which might be problematic for places where we are currently using
READ_ONCE() in lockless algorithms, (e.g. GUP?). We wouldn't want to end up with
a stale value.

Another idea:

Given the main pattern where we are aiming to optimize out the read is something
like:

if (!pud_present(*pud))

where for a folded pmd:

static inline int pud_present(pud_t pud) { return 1; }

And we will change it to this:

if (!pud_present(pudp_get(pud)))

...

perhaps we can just define the folded pXd_present(), pXd_none(), pXd_bad(),
pXd_user() and pXd_leaf() as macros:

#define pud_present(pud) 1

Let's take a step back and realize that with __PAGETABLE_PMD_FOLDED

(a) *pudp does not make any sense

For a folded PMD, *pudp == *pmdp and consequently we would actually
get a PMD, not a PUD.

For this reason all these pud_* helpers ignore the passed value
completely. It would be wrong.

(b) pmd_offset() does *not* consume a pud but instead a pudp.

That makes sense, just imagine what would happen if someone would pass
*pudp to that helper (we'd dereference twice ...).

So I wonder if we can just teach get_pudp() and friends to ... return
true garbage instead of dereferencing something that does not make sense?

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 32e8457ad5352..c95d0d89ab3f1 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -351,7 +351,13 @@ static inline pmd_t pmdp_get(pmd_t *pmdp)
#ifndef pudp_get
static inline pud_t pudp_get(pud_t *pudp)
{
+#ifdef __PAGETABLE_PMD_FOLDED
+ pud_t dummy = { 0 };
+
+ return dummy;
+#else
return READ_ONCE(*pudp);
+#endif
}
#endif
set_pud/pud_page/pud_pgtable helper are confusing, I would
assume they are essentially unused (like documented for set_put)
and only required to keep compilers happy.

Staring at GUP-fast and perf_get_pgtable_size()---which should better be
converted to pudp_get() etc--I guess we might have to rework
p4d_offset_lockless() to do something that doesn't rely on
passing variables of local variables.

We might have to enlighten these walkers (and only these) about folded
page tables such that they don't depend on the result of pudp_get() and
friends.

Talking to myself (I know), handling this might be as simple as having

diff --git a/include/asm-generic/pgtable-nopmd.h b/include/asm-generic/pgtable-nopmd.h
index 8ffd64e7a24cb..60e5ba02bcf06 100644
--- a/include/asm-generic/pgtable-nopmd.h
+++ b/include/asm-generic/pgtable-nopmd.h
@@ -49,6 +49,13 @@ static inline pmd_t * pmd_offset(pud_t * pud, unsigned long address)
}
#define pmd_offset pmd_offset
+static inline pmd_t * pmd_offset_lockless(pud_t *pud, puf_t pud, unsigned long address)
+{
+ return (pmd_t *)pud;
+}
+#define pmd_offset_lcokless pmd_offset_lockless
+
+
#define pmd_val(x) (pud_val((x).pud))
#define __pmd(x) ((pmd_t) { __pud(x) } )

IOW, just like for pmd_offset() we cast the pointer and don't ever touch the pud.

As a reminder, the default is

#ifndef pmd_offset_lockless
#define pmd_offset_lockless(pudp, pud, address) pmd_offset(&(pud), address)
#endif

(isn't that nasty? :) )

--
Cheers

David