Re: [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()

From: David Hildenbrand (Arm)

Date: Thu Jun 25 2026 - 04:31:45 EST


On 6/25/26 10:03, Dev Jain wrote:
>
>
> On 25/06/26 1:26 pm, David Hildenbrand (Arm) wrote:
>> On 6/25/26 06:28, Dev Jain wrote:
>>> try_to_unmap_one() handles hugetlb folios when memory failure needs
>>> to replace a poisoned hugetlb mapping with a hwpoison entry. In that
>>> case page_vma_mapped_walk() returns the hugetlb entry in pvmw.pte, but
>>> the code reads it with ptep_get() before decoding the PFN.
>>>
>>> That is wrong on architectures where hugetlb entries are not encoded as
>>> regular PTEs. On s390, for example, a raw huge RSTE must be converted
>>> by huge_ptep_get() before helpers such as pte_pfn() can inspect it. A
>>> raw decode can select the wrong subpage, so try_to_unmap_one() can
>>> install a hwpoison entry for the wrong PFN.
>>>
>>> The userspace-visible result is that a later access to the poisoned
>>> hugetlb subpage can miss the expected SIGBUS. With DEBUG_VM, the wrong
>>> subpage can also trip the PageHWPoison check.
>>>
>>> Use huge_ptep_get() for hugetlb mappings before decoding the PFN.
>>>
>>> Before c7ab0d2fdc84, the bug existed in the form of a plain dereference:
>>> we would check the head page pfn of the hugetlb with pte_pfn(*pte), and
>>> bail out on mismatch. This would mean that the hwpoisoned entry will not
>>> get installed.
>>>
>>> I am not sure what is the procedure on such kinds of very old bugs - how
>>> back should I really go?
>>>
>>> Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()")
>>> Cc: stable@xxxxxxxxxxxxxxx
>>> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
>>> ---
>>> Applies on mm-unstable (d17fe8a046a2).
>>> There are similar old bugs present, in try_to_migrate_one(), check_pte(),
>>> remove_migration_pte(), prot_none_hugetlb_entry().
>>
>> Yeah, we should handle all these cases properly. Can you send fixes?
>>
>> Using ptep_get() on something that's not a PTE entry is shaky on some architectures.
>
> I can send the fixes blaming the commit till which backport is relatively simple. The bug will
> still remain before that, where we don't even do ptep_get(), just a plain dereference, if
> that is fine. Probably no one is running pre-2017 kernels.

The issue is that we would have to analyze in which cases exactly it would cause
problems, like when migrating prot-none hugetlb folios on s390x, where
pte_present() would not work as expected.

I don't think any of us has time (or motivation) for that detailed analysis to
make some odd hugetlb cases happy.

So I'd say, let's just fix it in a simple way and be done with it. Use
best-effort Fixes: but rather state in the patch description that this was found
by code inspection and that the actual effects are unclear (e.g., pte_present()
misbehaving on s390x), and using huge_ptep_get() is just the right thing to do.
--
Cheers,

David