Re: [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()

From: Dev Jain

Date: Thu Jun 25 2026 - 04:40:32 EST




On 25/06/26 1:58 pm, David Hildenbrand (Arm) wrote:
> On 6/25/26 10:03, Dev Jain wrote:
>>
>>
>> On 25/06/26 1:26 pm, David Hildenbrand (Arm) wrote:
>>> On 6/25/26 06:28, Dev Jain wrote:
>>>> try_to_unmap_one() handles hugetlb folios when memory failure needs
>>>> to replace a poisoned hugetlb mapping with a hwpoison entry. In that
>>>> case page_vma_mapped_walk() returns the hugetlb entry in pvmw.pte, but
>>>> the code reads it with ptep_get() before decoding the PFN.
>>>>
>>>> That is wrong on architectures where hugetlb entries are not encoded as
>>>> regular PTEs. On s390, for example, a raw huge RSTE must be converted
>>>> by huge_ptep_get() before helpers such as pte_pfn() can inspect it. A
>>>> raw decode can select the wrong subpage, so try_to_unmap_one() can
>>>> install a hwpoison entry for the wrong PFN.
>>>>
>>>> The userspace-visible result is that a later access to the poisoned
>>>> hugetlb subpage can miss the expected SIGBUS. With DEBUG_VM, the wrong
>>>> subpage can also trip the PageHWPoison check.
>>>>
>>>> Use huge_ptep_get() for hugetlb mappings before decoding the PFN.
>>>>
>>>> Before c7ab0d2fdc84, the bug existed in the form of a plain dereference:
>>>> we would check the head page pfn of the hugetlb with pte_pfn(*pte), and
>>>> bail out on mismatch. This would mean that the hwpoisoned entry will not
>>>> get installed.
>>>>
>>>> I am not sure what is the procedure on such kinds of very old bugs - how
>>>> back should I really go?
>>>>
>>>> Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()")
>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
>>>> ---
>>>> Applies on mm-unstable (d17fe8a046a2).
>>>> There are similar old bugs present, in try_to_migrate_one(), check_pte(),
>>>> remove_migration_pte(), prot_none_hugetlb_entry().
>>>
>>> Yeah, we should handle all these cases properly. Can you send fixes?
>>>
>>> Using ptep_get() on something that's not a PTE entry is shaky on some architectures.
>>
>> I can send the fixes blaming the commit till which backport is relatively simple. The bug will
>> still remain before that, where we don't even do ptep_get(), just a plain dereference, if
>> that is fine. Probably no one is running pre-2017 kernels.
>
> The issue is that we would have to analyze in which cases exactly it would cause
> problems, like when migrating prot-none hugetlb folios on s390x, where
> pte_present() would not work as expected.
>
> I don't think any of us has time (or motivation) for that detailed analysis to
> make some odd hugetlb cases happy.
>
> So I'd say, let's just fix it in a simple way and be done with it. Use
> best-effort Fixes: but rather state in the patch description that this was found
> by code inspection and that the actual effects are unclear (e.g., pte_present()
> misbehaving on s390x), and using huge_ptep_get() is just the right thing to do.

Sure thing, sounds good.