Re: [PATCH v3 06/22] mm: Always use page table accessor functions
From: David Hildenbrand (Red Hat)
Date: Wed Nov 26 2025 - 07:35:24 EST
On 11/26/25 13:27, Lorenzo Stoakes wrote:
On Wed, Nov 26, 2025 at 01:19:00PM +0100, David Hildenbrand (Red Hat) wrote:
On 11/26/25 13:16, David Hildenbrand (Red Hat) wrote:
On 11/26/25 12:09, Ryan Roberts wrote:
On 13/11/2025 01:45, Samuel Holland wrote:
Some platforms need to fix up the values when reading or writing page
tables. Because of this, the accessors must always be used; it is not
valid to simply dereference a pXX_t pointer.
Fix all of the instances of this pattern in generic code, mostly by
applying the below coccinelle semantic patch, repeated for each page
table level. Some additional fixes were applied manually, mostly to
macros where type information is unavailable.
In a few places, a `pte_t *` or `pmd_t *` is actually a pointer to a PTE
or PMDE value stored on the stack, not a pointer to a page table. In
those cases, it is not appropriate to use the accessors, because the
value is not globally visible, and any transformation from pXXp_get()
has already been applied. Those places are marked by naming the pointer
`ptentp` or `pmdvalp`, as opposed to `ptep` or `pmdp`.
@@
pte_t *P;
expression E;
expression I;
@@
- P[I] = E
+ set_pte(P + I, E)
@@
pte_t *P;
expression E;
@@
(
- WRITE_ONCE(*P, E)
+ set_pte(P, E)
|
- *P = E
+ set_pte(P, E)
)
There should absolutely never be any instances of core code directly setting an
entry at any level. This *must* always go via the arch code helpers. Did you
find any instances of this? If so, I would consider these bugs and suggest
sending as a separate bugfix patch. Bad things could happen on arm64 because we
may need to break a contiguous mapping, which would not happen if the value is
set directly.
@@
pte_t *P;
expression I;
@@
(
&P[I]
|
- READ_ONCE(P[I])
+ ptep_get(P + I)
|
- P[I]
+ ptep_get(P + I)
)
@@
pte_t *P;
@@
(
- READ_ONCE(*P)
+ ptep_get(P)
|
- *P
+ ptep_get(P)
)
For reading the *PTE*, conversion over to ptep_get() should have already been
done (I did this a few years back when implementing support for arm64 contiguous
mappings). If you find any cases where direct dereference or READ_ONCE() is
being done in generic code for PTE, then that's a bug and should also be sent as
a separate patch.
FYI, my experience was that Coccinelle didn't find everything when I was
converting to ptep_get() - although it could have been that my Cochinelle skills
were not up to scratch! I ended up using an additional method where I did a
find/replace to convert "pte_t *" to "ptep_handle_t" and declared pte_handle_t
as a void* which causes a compiler error on dereference. Then in a few key
places I did a manual case from pte_handle_t to (pte_t *) and compiled allyesconfig.
I'm assuming the above Cocchinelle template was also used for pmd_t, pud_t,
p4d_t and pgd_t?
Additionally, the following semantic patch was used to convert PMD and
PUD references inside struct vm_fault:
@@
struct vm_fault vmf;
@@
(
- *vmf.pmd
+ pmdp_get(vmf.pmd)
|
- *vmf.pud
+ pudp_get(vmf.pud)
)
@@
struct vm_fault *vmf;
@@
(
- *vmf->pmd
+ pmdp_get(vmf->pmd)
|
- *vmf->pud
+ pudp_get(vmf->pud)
)
Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx>
---
This commit covers some of the same changes as an existing series from
Anshuman Khandual[1]. Unlike that series, this commit is a purely
mechanical conversion to demonstrate the RISC-V changes, so it does not
insert local variables to avoid redundant calls to the accessors. A
manual conversion like in that series could improve performance.
[1]: https://lore.kernel.org/linux-mm/20240917073117.1531207-1-anshuman.khandual@xxxxxxx/
Hi,
I've just come across this patch and wanted to mention that we could also
benefit from this improved absraction for some features we are looking at for
arm64. As you mention, Anshuman had a go but hit some roadblocks.
The main issue is that the compiler was unable to optimize away the READ_ONCE()s
for the case where certain levels of the pgtable are folded. But it can optimize
the plain C dereferences. There were complaints the the generated code for arm
(32) and powerpc was significantly impacted due to having many more (redundant)
loads.
We do have mm_pmd_folded()/p4d_folded() etc, could that help to sort
this out internally?
Just stumbled over the reply from Christope:
https://lkml.kernel.org/r/0019d675-ce3d-4a5c-89ed-f126c45145c9@xxxxxxxxxx
And wonder if we could handle that somehow directly in the pgdp_get() etc.
I find that kind of gross to be honest. Isn't the whole point of folding that we
don't have to think about it...
If we could adjust generic pgdp_get() and friends to not do a READ_ONCE() once folded we might not have to think about that in the callers.
Just an idea, though, not sure if that would fly the way I envision it.
--
Cheers
David