Re: 2.6.16-rc1: 28ms latency when process with lots of swapped memoryexits

From: Hugh Dickins
Date: Wed Mar 15 2006 - 02:48:46 EST


On Tue, 14 Mar 2006, Lee Revell wrote:
> On Tue, 2006-03-14 at 22:01 +0100, Ingo Molnar wrote:
> > hm, where does the latency come from? We do have a lockbreaker in
> > unmap_vmas():
> >
> > if (need_resched() ||
> > (i_mmap_lock &&
> > need_lockbreak(i_mmap_lock))) {
> > if (i_mmap_lock) {
> > *tlbp = NULL;
> > goto out;
> > }
> > cond_resched();
> > }
> >
> >
> > why doesnt this break up the 28ms latency?

That block is actually for PREEMPT n, and for truncating a mapped
file (i_mmap_lock additionally held): all Lee's PREEMPT y exit case
should need is the tlb_finish_mmu and tlb_gather_mmu around it,
letting preemption in - and the ZAP_BLOCK_SIZE 8*PAGE_SIZE.

> But the preempt count is >= 2, doesn't that mean some other lock must be
> held also, or someone called preempt_disable?

Yes, as I read the trace (and let me admit, I'm not at all skilled at
reading those traces), and as your swap observation implies, this is
not a problem with ptes present, but with swap entries: and with the
radix tree lookup involved in finding whether they have an associated
struct page in core - all handled while holding page table lock, and
while holding the per-cpu mmu_gather structure.

Oh, thank you for forcing me to take another look, 2.6.15 did make a
regression there, and this one is very simply remedied: Lee, please
try the patch below (I've done it against 2.6.16-rc6 because that's
what I have to hand; and would be a better tree for you to test),
and let us know if it fixes your case as I expect - thanks.

(Robin Holt observed how inefficient the small ZAP_BLOCK_SIZE was on
very sparse mmaps, as originally implemented; so he and Nick reworked
it to count only real work done; but the swap entries got put on the
side of "no real work", whereas you've found they may involve very
significant work. My patch below reverses that: yes, I've got some
other cases now going the slow way when they needn't, but they're
too rare to clutter the code for.)

Hugh

--- 2.6.16-rc6/mm/memory.c 2006-03-12 15:25:45.000000000 +0000
+++ linux/mm/memory.c 2006-03-15 07:32:36.000000000 +0000
@@ -623,11 +623,12 @@ static unsigned long zap_pte_range(struc
(*zap_work)--;
continue;
}
+
+ (*zap_work) -= PAGE_SIZE;
+
if (pte_present(ptent)) {
struct page *page;

- (*zap_work) -= PAGE_SIZE;
-
page = vm_normal_page(vma, addr, ptent);
if (unlikely(details) && page) {
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/