Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic

From: Hugh Dickins
Date: Tue Jun 16 2009 - 08:40:01 EST


On Tue, 16 Jun 2009, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Mon, 2009-06-15 at 20:52 +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > > On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote:
> > > > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote:
> > > > > > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > but ... look at the APIs i propose above. We dont need _any_
> > > > > > > 'types'.
> > > > > > >
> > > > > > > That type enumeration is basically an open-coded allocator. If we do
> > > > > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need
> > > > > > > any of those indices, and all the potential for mismatch goes away
> > > > > > > as well - a stack nests trivially with IRQ and NMI and arbitrary
> > > > > > > other contexts.
> > > > > >
> > > > > > You want types because:
> > > > > > - they encode the intent, and can be verified
> > > > > > - they help keep track of the max nesting depth
> > > > > >
> > > > > > In the proposed implementation all type code basically falls away
> > > > > > no ! CONFIG_DEBUG_VM, but is kept around for robustness.
> > > > >
> > > > > But much of the fragility of the types (and their clumsiness - for
> > > > > example in highpte ops we have to know at which level of the
> > > > > pagetables we are, and use the right kind of index) is _precisely_
> > > > > because we have the types ...
> > > >
> > > > How will you manage the max depth?
> > >
> > > if (++depth == MAX_DEPTH) {
> > > print_all_entries_and_nasty_warning();
> > > /* hope we'll live long enough for the syslog to touch disk */
> > > depth = 0;
> > > }
> >
> > That will only trigger if we hit it, which will be _very_ rare.
> >
> > > unbalanced kmap is a bad bug - the easier we make it to catch,
> > > the better. The system wouldnt survive anyway.
> >
> > My proposed patch validates strict balance of types. But I can
> > easily add the above as well.
> >
> > By removing the types it becomes very difficult to verify the max
> > depth. I really don't like removing them.
>
> The fact that it implies an atomic section pretty much limits its
> depth in practice, doesnt it?
>
> All we need to track in the debug code is
> max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts must be
> smaller than the max - even if (as you are right to point out) we
> dont hit that magic combo that truly maximizes the depth.
>
> And note that in practice many of the current types are exclusive to
> each other - so using the stack would _reduce_ the amount of
> kmap-atomic space we need.

I'll briefly resurface into the discussion before submerging again ;)

I like very much the direction you're taking this, Ingo.

Yes, that is how I've sometimes thought we should go - though when
making the kmap_push/kmap_pop suggestion to Peter yesterday, I wasn't
expecting him to make that revolution, just provide a way to save a
current KM_type mapping and restore it later, so he can safely use
the standard primitives like pte_offset_map() within.

I wasn't expecting in_nmi() and in_irq() tests still to be there,
even if only when debug. I can understand Peter's lockdep background
wanting to retain the checking and KM_types, but if we're actually
going to overhaul this area, I'd love just to get rid of them.

Yes, that should reduce the amount of kmap_atomic space needed;
though I've not thought how we keep track of the maximum needed
as the kernel goes on developing.

There might be a very few places where we expect to kmap_atomic A,
kmap_atomic B, kunmap_atomic A, kunmap_atomic B?

Something else to throw in: what if they were not just atomic,
but also replaced the current sleeping kmaps? i.e. a task context
carries around its own stack of these.

I've always rejected that as introducing a pretty terrible overhead
just where we don't want it; but maybe you're ingenious enough to
devise ways of amortizing that cost.

It would be nice to delete mm/highmem.c is we could. Ah, but there
are probably places where one task passes a kmap address to another?

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/