Re: [GIT PULL v2] Early SLAB fixes for 2.6.31

From: Nick Piggin
Date: Tue Jun 16 2009 - 00:57:29 EST

On Mon, Jun 15, 2009 at 01:38:12PM +0100, Hugh Dickins wrote:
> On Mon, 15 Jun 2009, Nick Piggin wrote:
> > On Mon, Jun 15, 2009 at 08:45:27PM +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2009-06-15 at 12:27 +0200, Nick Piggin wrote:
> > > > Init code doesn't deserve to be more lazy than anybody else, and
> > > > part of the reason why such a core piece of code is so crufty
> > > > is exactly because people have been lazy there.
> > >
> > > I think the main problem isn't necessarily init code per se, but the
> > > pile of -common- code that can be called both at init time and later.
> >
> > Just seems bogus argument. Everwhere else that does this (ie.
> > allocations that are called from multiple allocation contexts)
> > passes correct gfp flags down.
> Fair enough that you jealously defend SL?B code from onslaught, but
> FWIW I strongly agree with Ben on all this. I cannot see the point
> of the pain of moving around SL?B versus bootmem, if we immediately
> force such a distinction (differently dressed) upon their users again.

Well, it is a good idea for other reasons too, not just to
relieve the author of the distinction.

But the distinction now is much smaller. It is not a case of
if (system_state == something crazy)

if (object was kmalloced)

It is now this:

> I fully agree with Ben that it's the job of the allocator to provide
> a service, and part of that job to understand its own limitations at
> different stages of running.

Yes but that's heavily qualified. As I said, we already require
a lot of knowledge of context passed in to it. I have no interest
in adding code to make *early boot* code not have to care about
that, especially because everybody else certainly has to know
whether they are calling the allocator with interrupts disabled
or a lock held etc.

To be clear about this: the allocator is fully servicable and
no different to normal system running at this point. The
difference for example is that code runs with interrupts off
but incorrectly uses GFP_KERNEL for allocations. This is a
blatent bug in any other kernel code, I don't know why boot
code is OK to be horrible and wrong and work around it with
the equally horrible system_state (and this gfp mask which is
just system_state under another name).

I just don't want to use this slab fix in question to be a
license to throw away and forget all about any context information
in the early boot code because someone asserts "it will make the
code better". I'm fine with the slab change for now, but let's
try to retain context information as well.

If somebody comes up with a patch to remove thousands of lines
of boot code by ignoring context, then I might concede the
point and we could remove the context annotations.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at