RE: [RFC] Demand faulting for large pages

From: Adam Litke
Date: Mon Aug 08 2005 - 17:23:02 EST


On Fri, 2005-08-05 at 17:05, Chen, Kenneth W wrote:
> Adam Litke wrote on Friday, August 05, 2005 8:22 AM
> > Below is a patch to implement demand faulting for huge pages. The main
> > motivation for changing from prefaulting to demand faulting is so that
> > huge page allocations can follow the NUMA API. Currently, huge pages
> > are allocated round-robin from all NUMA nodes.
>
> Chen, Kenneth W wrote on Friday, August 05, 2005 2:34 PM
> > Spurious WARN_ON. Calls to hugetlb_pte_fault() is conditioned upon
> > if (is_vm_hugetlb_page(vma))
> >
> > ....
> >
> > Broken here. Return VM_FAULT_SIGBUS when *pte is present?? Why
> > can't you move all the logic into hugetlb_pte_fault and simply call
> > it directly from handle_mm_fault?

The reason for the VM_FAULT_SIGBUS default return is because I thought a
fault on a pte_present hugetlb page was an invalid/unhandled fault.
I'll have another think about races to the fault handler though.

With respect to your code logic comment: The idea was to make
hugetlb_fault() an entry point into the huge page fault handling code.
This would make the task of adding other types of faults (Copy on Write
for example) easier later. If people prefer, it would be easy enough to
roll everything into hugetlb_pte_fault().

> I'm wondering has this patch ever been tested? More broken bits:
> in arch/i386/mm/hugetlbpage.c:huge_pte_offset - with demand paging,
> you can't unconditionally walk the page table without checking
> existence of pud and pmd.

I have tested the patch to the best extent that I can, but would
definitely appreciate more :) Thanks for the hint about page table
walking. I've fixed that up for the next iteration.

>
> I haven't looked closely at recent change in free_pgtables(), but
> we used to have a need to scrub old pmd mapping before allocate one
> for hugetlb pte on x86. You have to do that in huge_pte_alloc(),
> I'm specifically concerned with arch/i386/mm/hugetlbpage.c:huge_pte_alloc()

I've definitely been able to produce some strange behavior on 2.6.7
relative to your post about this topic here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0406.2/0234.html
I confirmed the fix in 2.6.8 and also don't see the problem when using
my demand fault patch. Do you have a copy of the program you used to
generate the Oops in the post linked above so I can use it as a test
case? I'd guess either the problem is gone entirely with demand
faulting, or just harder to trigger.

--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/