Re: [PATCH 0/2] arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs

From: Ryan Roberts
Date: Tue Dec 05 2023 - 09:43:10 EST


On 04/12/2023 17:26, James Houghton wrote:
> It is currently possible for a userspace application to enter a page
> fault loop when using HugeTLB pages implemented with contiguous PTEs
> when HAFDBS is not available. This happens because:
> 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean
> (PTE_DIRTY | PTE_RDONLY | PTE_WRITE).

Hi James,

Do you know how this happens?

AFAIK, this is the set of valid bit combinations, and
PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is
to understand how this is happening and prevent it?

/*
* PTE bits configuration in the presence of hardware Dirty Bit Management
* (PTE_WRITE == PTE_DBM):
*
* Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw)
* 0 0 | 1 0 0
* 0 1 | 1 1 0
* 1 0 | 1 0 1
* 1 1 | 0 1 x
*
* When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via
* the page fault mechanism. Checking the dirty status of a pte becomes:
*
* PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)
*/


Thanks,
Ryan


> 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling
> the memory access on a system without HAFDBS, we will get a page
> fault.
> 3. HugeTLB will check if it needs to update the dirty bits on the PTE.
> For contiguous PTEs, it will check to see if the pgprot bits need
> updating. In this case, HugeTLB wants to write a sequence of
> sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about
> to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()),
> so it thinks no update is necessary.
>
> Please see this[1] reproducer.
>
> I think (though I may be wrong) that both step (1) and step (3) are
> buggy.
>
> The first patch in this series fixes step (3); instead of checking if
> pte_dirty is matching in __cont_access_flags_changed, check pte_hw_dirty
> and pte_sw_dirty separately.
>
> The second patch in this series makes step (1) less likely to occur.
> Without this patch, we can get the kernel to write a sw-dirty, hw-clean
> PTE with the following steps (showing the relevant VMA flags and pgprot
> bits):
> i. Create a valid, writable contiguous PTE.
> VMA vmflags: VM_SHARED | VM_READ | VM_WRITE
> VMA pgprot bits: PTE_RDONLY | PTE_WRITE
> PTE pgprot bits: PTE_DIRTY | PTE_WRITE
> ii. mprotect the VMA to PROT_NONE.
> VMA vmflags: VM_SHARED
> VMA pgprot bits: PTE_RDONLY
> PTE pgprot bits: PTE_DIRTY | PTE_RDONLY
> iii. mprotect the VMA back to PROT_READ | PROT_WRITE.
> VMA vmflags: VM_SHARED | VM_READ | VM_WRITE
> VMA pgprot bits: PTE_RDONLY | PTE_WRITE
> PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY
>
> Applying either one of the two patches in this patchset will fix the
> particular issue with HugeTLB pages implemented with contiguous PTEs.
> It's possible that only one of these patches should be taken, or that
> the right fix is something else entirely.
>
> [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0
>
> James Houghton (2):
> arm64: hugetlb: Distinguish between hw and sw dirtiness in
> __cont_access_flags_changed
> arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
>
> arch/arm64/include/asm/pgtable.h | 6 ++++++
> arch/arm64/mm/hugetlbpage.c | 5 ++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
>
> base-commit: 645a9a454fdb7e698a63a275edca6a17ef97afc4