Re: IA64 non-contiguous memory space bugs
From: 'David Gibson'
Date: Tue Feb 21 2006 - 21:25:04 EST
On Tue, Feb 21, 2006 at 05:51:52PM -0800, Chen, Kenneth W wrote:
> David Gibson wrote on Tuesday, February 21, 2006 4:14 PM
> > Second problem is in the hugepage logic in free_pgtables()
> > (mm/memory.c). As far as I can tell it's complete crap, and only
> > works by accident, for different accidental reasons on ppc64 and ia64,
> > the only archs that have a non-trivial is_hugepage_only_range().
> > Except that I'm not sure it does entirely work by accident on ia64:
> > suppose a process has a hugepage mapping that begins some way after
> > the beginning of the hugepage address range. Before
> > hugetlb_free_pgd_range() gets called on that area, it will be called
> > on the next normal page VMA down - but with an end address at the
> > beginning of the hugepage VMA and so extending into the hugepage
> > address range. I don't really understand the ia64 pagetable mapping
> > stuff well enough to tell if that's dangerous or not.
>
> Chen, Kenneth W wrote on Tuesday, February 21, 2006 5:32 PM
> > I don't see any problem in the ia64 code. The start and end address is
> > what the vma specified. Floor and ceiling is just a hint for free_pgtables()
> > to free any left over page tables between vma holes (to prev and next).
> > As far as I can tell, the code looks fine.
>
>
> free_pgtables() has partial crap that the check of is_hugepage_only_range()
> should be done on the entire vma range, not just the first hugetlb page.
> Though, it's not possible to have a hugetlb vma while having normal page
> instantiated inside that vma. So the bug is mostly phantom. For
> correctness, it should be fixed.
Actually, from ppc64's point of view, the problem with the test is
that the whole vma could be *less* than HPAGE_SIZE - we don't test
that the address is aligned before checking is_hugepage_only_range().
We thus can call hugetlb_free_pgd_range() on normal page VMAs - which
we only get away with because the ppc64 hugetlb_free_pgd_range() is
(so far) an alias for the normal free_pgd_range().
Your patch below is insufficient, because there's a second test of
is_hugepage_only_range() further down. However, instead of tweaking
the tested ranges, I think what we really want to do is check for
is_vm_hugetlb_page() instead.
I was worried before, but now that you point out it's the 'end'
address which really matters, not the ceiling, that might be
sufficient. Um.. except that hugepages, unlike normal pages these
days don't necessarily clean up all possible pagetable pages on
unmap... crud. Still the patch below ought to be an improvement.
Index: working-2.6/mm/memory.c
===================================================================
--- working-2.6.orig/mm/memory.c 2006-02-22 10:42:14.000000000 +1100
+++ working-2.6/mm/memory.c 2006-02-22 13:22:07.000000000 +1100
@@ -277,7 +277,7 @@ void free_pgtables(struct mmu_gather **t
anon_vma_unlink(vma);
unlink_file_vma(vma);
- if (is_hugepage_only_range(vma->vm_mm, addr, HPAGE_SIZE)) {
+ if (is_vm_hugetlb_page(vma)) {
hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
floor, next? next->vm_start: ceiling);
} else {
@@ -285,8 +285,7 @@ void free_pgtables(struct mmu_gather **t
* Optimization: gather nearby vmas into one call down
*/
while (next && next->vm_start <= vma->vm_end + PMD_SIZE
- && !is_hugepage_only_range(vma->vm_mm, next->vm_start,
- HPAGE_SIZE)) {
+ && !is_vm_hugetlb_page(vma->vm_mm)) {
vma = next;
next = vma->vm_next;
anon_vma_unlink(vma);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/