Re: [RFC] page fault retry with NOPAGE_RETRY
From: Andrew Morton
Date: Tue Sep 19 2006 - 20:27:44 EST
On Wed, 20 Sep 2006 10:05:12 +1000
Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > I need to re-read your mail and Andrew as at this point, I don't quite
> > see why we need that args and/or that current->flags bit instead of
> > always returning all the way to userland and let the faulting
> > instruction happen again (which means you don't block in the kernel, can
> > take signals etc... thus do you actually need to prevent multiple
> > retries ?)
>
> Actually... I can see it's faster to not return all the way and take the
> fault again ... though only in some cases. For example, if the pte has
> been filled in the meantime (concurrent faults) it's actually faster to
> just go back. The only reason I see why you need those args is to tell
> the no_page() handler wether retrying is acceptable or wether it should
> use the old way. Any reason why this is necessary at all ? What are you
> trying to avoid by not letting it always do the retry path ?
Livelocks. I've described the deliberate one, but I fear there are
accidental ones I haven't thought of.
> My thinking was something around the lines of no_page() always does the
> retry logic. Then, we do something like:
>
> handle_pte_fault() gets modified. If do_no_page() returns
> VM_FAULT_RETRY, it checks pte_present() again. If the PTE is present, it
> returns VM_FAULT_MINOR. If PTE is absent, it checks for signals, and
> returns VM_FAULT_MINOR if a signal is pending. If PTE is absent and no
> signals are pending, it returns VM_FAULT_RETRY.
You forget that the point of this optimisation is to undo mmap_sem while
waiting on the disk IO. Once we've done that we cannot go looking at ptes
or vmas: another thread could have munapped the whole lot or anything.
(And we always need to be afraid of use_mm()..)
Once mmap_sem has been dropped we need to go all the way back to the
process's virtual address and redo the vma lookup. The easy and clean way
of doing that is to rerun the fault, reuse all the existing code.
> In addition, we still need to modify all archs do_page_fault() to handle
> VM_FAULT_RETRY...
Yup, it would need some temporary ifdeffery while architetures convert.
But bear in mind my earlier comments regarding possible optimisations to
this code.
> Or is there a specific scenario you are trying to avoid by keeping this
> mecanism for preventing retries ?
>
> Ben.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/