Re: [PATCH] mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn

From: Miaohe Lin

Date: Sun Dec 21 2025 - 22:01:33 EST


On 2025/12/19 16:06, jane.chu@xxxxxxxxxx wrote:
>
>
> On 12/19/2025 12:01 AM, Miaohe Lin wrote:
>> On 2025/12/19 14:28, Jane Chu wrote:
>>> When a hugetlb folio is being poisoned again, try_memory_failure_hugetlb()
>>> passed head pfn to kill_accessing_process(), that is not right.
>>> The precise pfn of the poisoned page should be used in order to
>>> determine the precise vaddr as the SIGBUS payload.
>>>
>>> This issue has already been taken care of in the normal path, that is,
>>> hwpoison_user_mappings(), see [1][2].  Further more, for [3] to work
>>> correctly in the hugetlb repoisoning case, it's essential to inform
>>> VM the precise poisoned page, not the head page.
>>>
>>> [1] https://lkml.kernel.org/r/20231218135837.3310403-1-willy@xxxxxxxxxxxxx
>>> [2] https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@xxxxxxxxxx
>>> [3] https://lore.kernel.org/lkml/20251116013223.1557158-1-jiaqiyan@xxxxxxxxxx/
>>>
>>
>> Thanks for your patch.
>>
>>> Cc: <stable@xxxxxxxxxxxxxxx>
>>> Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx>
>>> ---
>>>   mm/memory-failure.c | 22 ++++++++++++----------
>>>   1 file changed, 12 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 3edebb0cda30..c9d87811b1ea 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -681,9 +681,11 @@ static void set_to_kill(struct to_kill *tk, unsigned long addr, short shift)
>>>   }
>>>     static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
>>> -                unsigned long poisoned_pfn, struct to_kill *tk)
>>> +                unsigned long poisoned_pfn, struct to_kill *tk,
>>> +                int pte_nr)
>>>   {
>>>       unsigned long pfn = 0;
>>> +    unsigned long hwpoison_vaddr;
>>>         if (pte_present(pte)) {
>>>           pfn = pte_pfn(pte);
>>> @@ -694,10 +696,11 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
>>>               pfn = swp_offset_pfn(swp);
>>>       }
>>>   -    if (!pfn || pfn != poisoned_pfn)
>>> +    if (!pfn || (pfn > poisoned_pfn || (pfn + pte_nr - 1) < poisoned_pfn))
>>>           return 0;
>>
>> Can we get pte_nr from @shift? I.e. something like "pte_nr = 1UL << (shift - PAGE_SHIFT);"?
>
> Why?  Is there any concern with using the macro pages_per_huge_page(h) ?

No, I was trying to get rid of new @pte_nr parameter. Something like below:

static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
- unsigned long poisoned_pfn, struct to_kill *tk,
- int pte_nr)
+ unsigned long poisoned_pfn, struct to_kill *tk)
{
unsigned long pfn = 0;
unsigned long hwpoison_vaddr;
+ int pte_nr;

if (pte_present(pte)) {
pfn = pte_pfn(pte);
@@ -701,7 +701,8 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
pfn = softleaf_to_pfn(entry);
}

- if (!pfn || (pfn > poisoned_pfn || (pfn + pte_nr - 1) < poisoned_pfn))
+ pte_nr = 1UL << (shift - PAGE_SHIFT);
+ if (!pfn || (pfn > poisoned_pfn || (pfn + pte_nr - 1) < poisoned_pfn))
return 0;

hwpoison_vaddr = addr + ((poisoned_pfn - pfn) << PAGE_SHIFT);

So we don't have to pass in pte_nr from all callers. But that's trivial.

Thanks.
.