Re: [PATCH] mm: compaction: Abort compaction if too many pages areisolated and caller is asynchronous
From: Andrea Arcangeli
Date: Wed Jun 01 2011 - 19:31:24 EST
Hi Mel,
On Wed, Jun 01, 2011 at 10:40:18PM +0100, Mel Gorman wrote:
> On Wed, Jun 01, 2011 at 09:15:29PM +0200, Andrea Arcangeli wrote:
> > On Wed, Jun 01, 2011 at 06:58:09PM +0100, Mel Gorman wrote:
> > > Umm, HIGHMEM4G implies a two-level pagetable layout so where are
> > > things like _PAGE_BIT_SPLITTING being set when THP is enabled?
> >
> > They should be set on the pgd, pud_offset/pgd_offset will just bypass.
> > The splitting bit shouldn't be special about it, the present bit
> > should work the same.
>
> This comment is misleading at best then.
>
> #define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */
>From common code point of view it's set in the pmd, the comment can be
extended to specify it's actually the pgd in case of 32bit noPAE but I
didn't think it was too misleading as we think in common code terms
all over the code, the fact it's a bypass is pretty clear across the
whole archs.
> At the PGD level, it can have PSE set obviously but it's not a
> PMD. I confess I haven't checked the manual to see if it's safe to
> use _PAGE_BIT_UNUSED1 like this so am taking your word for it. I
To be sure I re-checked on 253668.pdf page 113/114 noPAE and page 122
PAE, on x86 32bit/64 all ptes/pmd/pgd (32bit/64bit PAE/noPAE) have bit
9-11 "Avail" to software. So I think we should be safe here.
> found that the bug is far harder to reproduce with 3 pagetable levels
> than with 2 but that is just timing. So far it has proven impossible
> on x86-64 at least within 27 hours so that has me looking at how
> pagetable management between x86 and x86-64 differ.
Weird.
However I could see it screwing the nr_inactive/active_* stats, but
the nr_isolated should never go below zero, and especially not anon
even if split_huge_page does the accounting wrong (and
migrate/compaction won't mess with THP), or at least I'd expect things
to fall apart in other ways and not with just a fairly innocuous and
not-memory corrupting nr_isolated_ counter going off just by one.
The khugepaged nr_isolated_anon increment couldn't affect the file one
and we hold mmap_sem write mode there to prevent the pte to change
from under us, in addition to the PT and anon_vma lock. Anon_vma lock
being wrong sounds unlikely too, and even if it was it should screw
the nr_isolated_anon counter, impossible to screw the nr_isolated_file
with khugepaged.
Where did you put your bugcheck? It looked like you put it in the < 0
reader, can you add it to all _inc/dec/mod (even _inc just in case) so
we may get a stack trace including the culprit? (not guaranteed but
better chance)
> Barriers are a big different between how 32-bit !SMP and X86-64 but
> don't know yet which one is relevant or if this is even the right
> direction.
The difference is we need xchg on SMP to avoid losing the dirty
bit. Otherwise if we do pmd_t pmd = *pmdp; *pmdp = 0; the dirty bit
may have been set in between the two by another thread running in
userland in a different CPU, while the pmd was still "present". As
long as interrupts don't write to read-write userland memory with the
pte dirty bit clear, we shouldn't need xchg on !SMP.
On PAE we also need to write 0 into pmd_low before worrying about
pmd_high so the present bit is cleared before clearing the high part
of the 32bit PAE pte, and we relay on xchg implicit lock to avoid a
smp_wmb() in between the two writes.
I'm unsure if any of this could be relevant to our problem, also there
can't be more than one writer at once in the pmd, as nobody can modify
it without the page_table_lock held. xchg there is just to be safe for
the dirty bit (or we'd corrupt memory with threads running in userland
and writing to memory on other cpus while we ptep_clear_flush).
I've been wondering about the lack of "lock" on the bus in atomic.h
too, but I can't see how it can possibly matter on !SMP, vmstat
modifications should execute only 1 asm insn so preempt or irq can't
interrupt it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/