Re: [PATCH] x86: fix PAE pmd_bad bootup warning

From: Hugh Dickins
Date: Thu May 08 2008 - 15:01:27 EST


On Thu, 8 May 2008, Nishanth Aravamudan wrote:
>
> So, is there any way to either add a is_vm_hugetlb_page(vma) check into
> pagemap_read()? Or can we modify walk_page_range to take the a vma and
> skip the walking if is_vm_hugetlb_page(vma) is set [to avoid
> complications down the road until hugepage walking is fixed]. I guess
> the latter isn't possible for pagemap_read(), since we are just looking
> at arbitrary addresses in the process space?
>
> Dunno, seems quite clear that the bug is in pagemap_read(), not any
> hugepage code, and that the simplest fix is to make pagemap_read() do
> what the other walker-callers do, and skip hugepage regions.

Yes, I'm afraid it needs an is_vm_hugetlb_page(vma) in there somehow:
as you observe, that's what everything else uses to avoid huge issues.

A pmd_huge(*pmd) test is tempting, but it only ever says "yes" on x86:
we've carefully left it undefined what happens to the pgd/pud/pmd/pte
hierarchy in the general arch case, once you're amongst hugepages.

Might follow_huge_addr() be helpful, to avoid the need for a vma?
Perhaps, but my reading is that actually we've never really been
testing that path's success case (because get_user_pages already
skipped is_vm_hugetlb_page), so it might hold further surprises
on one architecture or another.

Many thanks to Hans for persisting, and pointing us to pagemap
to explain this hugepage leak: yes, the pmd_none_or_clear_bad
will be losing it - and corrupting target user address space.

Cc'ed Matt: he may have a view on what he wants his pagewalker
to do with hugepages: I fear it would differ from one usage to
another. Skip over them has to be safest, though not ideal.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/