Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling

From: Punit Agrawal
Date: Thu Mar 09 2017 - 12:54:52 EST

[ +steve for arm64 mm and hugepages chops ]

"Baicar, Tyler" <tbaicar@xxxxxxxxxxxxxx> writes:

> On 3/7/2017 12:56 PM, Punit Agrawal wrote:
>> Punit Agrawal <punit.agrawal@xxxxxxx> writes:
>> [...]
>>> The code looks good but I ran into some failures while running the
>>> hugepages hwpoison tests from mce-tests suite[0]. I get a bad pmd error
>>> in dmesg -
>>> [ 344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>>> I suspect that this is due to the huge pte accessors not correctly
>>> dealing with poisoned entries (which are represented as swap entries).
>> I think I've got to the bottom of the issue - the problem is due to
>> huge_pte_at() returning NULL for poisoned pmd entries (which in turn is
>> due to pmd_present() not handling poisoned pmd entries correctly)
>> The following is the call chain for the failure case.
>> do_munmap
>> unmap_region
>> unmap_vmas
>> unmap_single_vma
>> __unmap_hugepage_range_final # The test case uses hugepages
>> __unmap_hugepage_range
>> huge_pte_offset # Returns NULL for a poisoned pmd
>> Reverting 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after
>> pmd_mknotpresent()") fixes the problem for me but I don't think that is
>> the right fix.
>> While I work on a proper fix, it would be great if you can confirm that
>> reverting 5bb1cc0ff9a6 makes the problem go away at your end.
> Thanks Punit! I haven't got a chance to do this yet, but I will let
> you know once I get it tested :)

This time with a patch. Please test this instead.

After a lot of head scratching, I've bit the bullet and added a check to
return the poisoned entry from huge_pte_offset(). What with having to
deal with contiguous hugepages et al., there just doesn't seem to be any
leeway in how we handle the situation here.

Let's see if there are any other ideas. Patch follows.