Re: [QUESTION] Plain dereference and READ_ONCE() in fault handler

From: Dev Jain
Date: Wed Mar 05 2025 - 10:06:24 EST




On 05/03/25 4:16 pm, David Hildenbrand wrote:
On 05.03.25 11:21, Dev Jain wrote:
In __handle_mm_fault(),

1. Why is there a barrier() for the PUD logic?

Good question. It was added in

commit a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517
Author: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Date:   Fri Feb 24 14:57:02 2017 -0800

    mm, x86: add support for PUD-sized transparent hugepages

Maybe it was an alternative to performing a READ_ONCE(*vmf.pud).

Maybe it was done that way, because pudp_get_lockless() does not exist. And it would likely not be required, because on architectures where ptep_get_lockless() does some magic (see below, mostly 32bit), PUD THP are not applicable.

Thanks for your reply David.



2. For the PMD logic, in the if block, we use *vmf.pmd, and in the else block
    we use pmdp_get_lockless(); what if someone changes the pmd just when we
    have begun processing the conditions in the if block, fail in the if block
    and then the else block operates on a different pmd value. Shouldn't we cache
    the value of the pmd and operate on a single consistent value until we take the
    lock and then finally check using pxd_same() and friends?

The pmd_none(*vmf.pmd) is fine. create_huge_pmd() must be able to deal with races, because we are not holding any locks.

I had a mental hiccup, yes we don't need the cached value even before the if block, as the relevant path will eventually check after taking the lock. I was thinking of all sorts of weird races.


We only have to use pmdp_get_lockless() when we want to effectively perform a READ_ONCE(), and make sure that we read something "reasonable" that we can operate on, even with concurrent changes. (e.g., not read a garbage PFN just because of some concurrent changes)

Oh I just looked at the arm64 definition and assumed ptep_get_lockless() == READ_ONCE() :) Now this makes sense. So I am guessing that we can still get away with a *vmf.pmd on 64-bit arches?

A separate question: When taking the create_huge_pmd() path with a read fault and after taking the pmd lock, why shouldn't we check with pmd_none(pmdp_get_lockless(vmf.pmd)) instead of plain *vmf.pmd...surely now we must load the actual correct value from memory since we are committing into mapping the huge zero folio?
And after looking somewhat more, why even is a pmd_none(*pmd) there in set_huge_zero_folio()...it should be the responsibility of the caller to verify this? Any caller will just assume that it got the huge zero folio mapped so this check should be redundant.


We'll store the value in vmf.orig_pmd, on which we'll operate and try to detect later changes using pmd_same(). So we really want something consistent in there.

See the description above ptep_get_lockless(), why we cannot simply do a READ_ONCE on architectures where a PTE cannot be read atomically (e.g., 8 byte PTEs on 32bit architecture).