Re: [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()

From: Punit Agrawal
Date: Tue Apr 04 2017 - 14:47:23 EST


Hi Mark,

Mark Rutland <mark.rutland@xxxxxxx> writes:

> Hi Punit,
>
> On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
>> huge_pte_offset() does not correctly handle poisoned or migration page
>> table entries.
>
> What exactly does it do wrong?
>
> Judging by the patch, we return NULL in some cases we shouldn't, right?

huge_pte_offset() returns NULL when it comes across swap entries for any
of the supported hugepage sizes.

>
> What can result from this? e.g. can we see data corruption?

In the tests I am running, it results in an error in the log -

[ 344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

when unmapping the page tables for the process that owns the poisoned
page.

In some instances, returning NULL instead of swap entries could lead to
data corruption - especially when the page tables contain migration swap
entries. But since hugepage migration is not enabled on arm64 I haven't
seen any corruption.

I've updated the commit log with more details locally.

>
>> Not knowing the size of the hugepage entry being
>> requested only compounded the problem.
>>
>> The recently added hstate parameter can be used to determine the size of
>> hugepage being accessed. Use the size to find the correct page table
>> entry to return when coming across a swap page table entry.
>>
>> Signed-off-by: Punit Agrawal <punit.agrawal@xxxxxxx>
>> Cc: David Woods <dwoods@xxxxxxxxxxxx>
>
> Given this is a fix for a bug, it sounds like it should have a fixes
> tag, or a Cc stable...

The problem doesn't occur until we enable memory failure handling. So
there shouldn't be a problem on earlier kernels.

Thanks,
Punit

>
> Thanks,
> Mark.
>
>> ---
>> arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>> 1 file changed, 16 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 9ca742c4c1ab..44014403081f 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>> pte_t *huge_pte_offset(struct mm_struct *mm,
>> unsigned long addr, struct hstate *h)
>> {
>> + unsigned long sz = huge_page_size(h);
>> pgd_t *pgd;
>> pud_t *pud;
>> - pmd_t *pmd = NULL;
>> - pte_t *pte = NULL;
>> + pmd_t *pmd;
>> + pte_t *pte;
>>
>> pgd = pgd_offset(mm, addr);
>> pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>> if (!pgd_present(*pgd))
>> return NULL;
>> +
>> pud = pud_offset(pgd, addr);
>> - if (!pud_present(*pud))
>> + if (pud_none(*pud) && sz != PUD_SIZE)
>> return NULL;
>> -
>> - if (pud_huge(*pud))
>> + else if (!pud_table(*pud))
>> return (pte_t *)pud;
>> +
>> + if (sz == CONT_PMD_SIZE)
>> + addr &= CONT_PMD_MASK;
>> +
>> pmd = pmd_offset(pud, addr);
>> - if (!pmd_present(*pmd))
>> + if (pmd_none(*pmd) &&
>> + !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>> return NULL;
>> -
>> - if (pte_cont(pmd_pte(*pmd))) {
>> - pmd = pmd_offset(
>> - pud, (addr & CONT_PMD_MASK));
>> - return (pte_t *)pmd;
>> - }
>> - if (pmd_huge(*pmd))
>> + else if (!pmd_table(*pmd))
>> return (pte_t *)pmd;
>> - pte = pte_offset_kernel(pmd, addr);
>> - if (pte_present(*pte) && pte_cont(*pte)) {
>> +
>> + if (sz == CONT_PTE_SIZE) {
>> pte = pte_offset_kernel(
>> pmd, (addr & CONT_PTE_MASK));
>> return pte;
>> }
>> +
>> return NULL;
>> }
>>
>> --
>> 2.11.0
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel