Re: [patch] mm: reduce pagetable-freeing latencies

From: Peter Zijlstra
Date: Wed Jul 25 2007 - 02:44:29 EST


On Wed, 2007-07-25 at 07:29 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2007-07-24 at 14:13 +0200, Andi Kleen wrote:
> > Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> writes:
> >
> > > > What a truly putrid patch. I am suspecting that this was a quick
> > > > get-you-out-of-trouble thing, which then got forgotten about.
> > > >
> > > > We have two months to do the "right fix". Please?
> > >
> > > Working on it...
> >
> > Ideally the patch would DTRT even on non preemptible kernels,
> > aka do cond_resched()s when needed.
>
> First is to rework the batch structure to make it more manageable. That
> is, patch #1 will keep the page list in per-cpu (and thus non-preempt),
> but the batch "head" will be on the stack.
>
> Now, there are two approaches regarding getting rid of the
> get_cpu/put_cpu:
>
> - One is to have a small number of entries for the page list in the
> batch structure on the stack, and attempt to gfp' a page for more. If
> that fails, we can still free, though with less batching, using only the
> few entries in the batch struct itself. That's Hugh initial appraoch
> iirc.
>
> - Another is to hook up with those folks who've been asking for a
> notifier that we are being preempted/scheduled out. In this case, I can
> happily access the per-cpu list, and just trigger a batch flush if we
> happen to be scheduled out.
>
> I tend to prefer the former solution though, gfp should be fast, and
> there is no need to force a flush if we get scheduled out. It would be
> rare to hit the worst case scenario of falling back to the few page
> heads in the batch itself. On the other hand, that solution has the
> problem of bloating the stack a bit (with the few page pointers) even in
> the case where I plan to use the extended batch outside of zap_*, such
> as fork, mprotect, ....
>
> So I'll first do patch #1, which will not fix the problem, but will make
> the fix easier to fit in, in the meantime, please provide feedback of
> your preferred solution for avoiding the get/put_cpu of the 2 above,
> unless you find a good 3rd one.

I too would prefer the former solution. I think preemption notifiers are
a particular iffy hack.

You could perhaps use C99 variable length arrays to avoid the stack
waste when not needed, however Andi once told me that generates rather
dubious code.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/