Re: [PATCH] mm: disable preemption in apply_to_pte_range

From: Peter Zijlstra
Date: Fri Feb 13 2009 - 09:15:39 EST


On Sat, 2009-02-14 at 00:30 +1100, Nick Piggin wrote:
> On Friday 13 February 2009 22:48:30 Peter Zijlstra wrote:
> > On Thu, 2009-02-12 at 17:39 -0800, Jeremy Fitzhardinge wrote:
> > > In general the model for lazy updates is that you're batching the
> > > updates in some queue somewhere, which is almost certainly a piece of
> > > percpu state being maintained by someone. Its therefore broken and/or
> > > meaningless to have the code making the updates wandering between cpus
> > > for the duration of the lazy updates.
> > >
> > > > If so, should we do the preempt_disable/enable within those functions?
> > > > Probably not worth the cost, I guess.
> > >
> > > The specific rules are that
> > > arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to be
> > > holding the appropriate pte locks for the ptes you're updating, so
> > > preemption is naturally disabled in that case.
> >
> > Right, except on -rt where the pte lock is a mutex.
> >
> > > This all goes a bit strange with init_mm's non-requirement for taking
> > > pte locks. The caller has to arrange for some kind of serialization on
> > > updating the range in question, and that could be a mutex. Explicitly
> > > disabling preemption in enter_lazy_mmu_mode would make sense for this
> > > case, but it would be redundant for the common case of batched updates
> > > to usermode ptes.
> >
> > I really utterly hate how you just plonk preempt_disable() in there
> > unconditionally and without very clear comments on how and why.
>
> And even on mainline kernels, builds without the lazy mmu mode stuff
> don't need preemption disabled here either, so it is technically a
> regression in those cases too.

Well, normally we'd be holding the pte lock, which on regular kernels
already disable preemption, as Jeremy noted. So in that respect it
doesn't change things too much.

Its just that slapping preempt_disable()s around like there's not
tomorrow is horridly annoying, its like using the BKL -- there's no data
affinity what so ever, so trying to unravel the dependencies a year
later when you notice its a latency concern is a massive pain in the
backside.

> > I'd rather we'd fix up the init_mm to also have a pte lock.
>
> Well that wouldn't fix -rt; there would need to be a preempt_disable
> within arch_enter_lazy_mmu_mode(), which I think is the cleanest
> solution.

Hmm, so you're saying we need to be cpu-affine for the lazy mmu stuff?
Otherwise a -rt would just convert the init_mm pte lock to a mutex along
with all other pte locks and there'd be no issue.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/