Re: [PATCH v3 08/22] mm: Allow page table accessors to be non-idempotent
From: David Hildenbrand (Red Hat)
Date: Thu Dec 18 2025 - 04:52:51 EST
On 12/11/25 14:59, Ryan Roberts wrote:
On 11/12/2025 00:33, Samuel Holland wrote:
On 2025-11-28 2:47 AM, David Hildenbrand (Red Hat) wrote:
On 11/27/25 17:57, Ryan Roberts wrote:I agree that this patch is ugly. The only way I see to avoid code like this is
On 13/11/2025 01:45, Samuel Holland wrote:
Currently, some functions such as pte_offset_map() are passed both
pointers to hardware page tables, and pointers to previously-read PMD
entries on the stack. To ensure correctness in the first case, these
functions must use the page table accessor function (pmdp_get()) to
dereference the supplied pointer. However, this means pmdp_get() is
called twice in the second case. This double call must be avoided if
pmdp_get() applies some non-idempotent transformation to the value.
Avoid the double transformation by calling set_pmd() on the stack
variables where necessary to keep set_pmd()/pmdp_get() calls balanced.
I don't think this is a good solution.
Agreed,
set_pmd(&pmd, pmd);
is rather horrible.
to refactor (or duplicate) the functions so no function takes pointers to both
hardware page tables and on-stack page table entries. Is that sort of
refactoring the right direction to go for v4?
From a quick look at the code, I think that some cases are solvable by
refactoring to pass the value instead of the pointer, and leave it to the higher
level decide how to read the value from the pointer - it knows if it is pointing
to HW pgtable or if it's a (e.g) stack value.
But the more I look at the code, the more instances I find where pointers to
stack variables are being passed to arch pgtable helpers as if they are HW
pgtable entry pointers. (Mainly pmd level).
I wonder if we need to bite the bullet and explicitly separate the types? At
each level, we have:
1. page table entry value
2. pointer to page table entry _value_ (e.g. pointer to pXX_t on stack)
3. pointer to page table entry in HW pgtable
Today, 1 is represented by pte_t, pmd_t, etc. 2 and 3 are represented by the
same type; pte_t*, pmd_t*, etc.
If we create a new type for 3, it will both document and enforce when type 2 or
type 3 is required.
e.g:
// pte_t: defined by arch.
typedef unsigned long pte_t;
// ptep_t: new opaque type that can't be dereferenced.
struct __ptep_t;
typedef struct __ptep_t *ptep_t;
This is what I had in mind when we last discussed this topic and I suggested a way forward to not play whack-a-mole with new users that do *ptep showing up.
Agreed that we ideally indicate that this is a HW PTE pointer that must be de-referenced through ptep_get() or similar. (maybe we'd find a new name for this set of functions).
I talked to Samuel at LPC, pointing him at the way XEN-pv implemented support for changing PFNs in ptes. We might not need all of that rework to move forward with the risc-v change.
Having that said, I also agree that this would be a cleanup worth having (which will result in quite a bit of churn :) ).
So if someone wants to bump up the patch count, speak now.
--
Cheers
David