Re: [GIT PULL v2] Early SLAB fixes for 2.6.31

From: Benjamin Herrenschmidt
Date: Mon Jun 15 2009 - 17:54:52 EST


On Mon, 2009-06-15 at 13:38 +0200, Nick Piggin wrote:
> On Mon, Jun 15, 2009 at 01:28:28PM +0200, Nick Piggin wrote:
> > On Mon, Jun 15, 2009 at 01:22:05PM +0200, Nick Piggin wrote:
> > > On Mon, Jun 15, 2009 at 08:39:48PM +1000, Benjamin Herrenschmidt wrote:
> > > But I won't live with having it shit in our nice core code...
> > > Well, at least I won't throw up my hands and give up this
> > > early.
> >
> > Just the principle, btw.
>
> I have the same opinion for suspend/resume too, although
> in that case I know less about the issues and if we
> found that it indeed does make a random driver writers
> life easier[*] then it might be a reason to do this. But
> I still don't think that would give boot code a license to
> just revert back to "I don't know or care, GFP_KERNEL pelase"
>
> [*] and note that being unaware of your context I don't
> think is making life easier automatically.

The suspend/resume case is even worse ... because drivers don't know,
and don't have to.

IE. We are talking here about pretty much -any- kmalloc in the kernel,
you don't seem to understand that.

The problem here is that driver A has suspended and happen to be on the
swapout path. driver B hasn't been suspended yet, and potentially
doesn't even know there's a suspend/resume cycle in progress.

Now, driver B, while holding for example one of its internal mutexes,
calls something that calls something that does a kmalloc(GFP_KERNEL) ...
The later will potentially block forever (or at least until resume)
because the allocator may try to swap something out to devices driven by
driver A while it's suspended.

Now, driver B suspend() is called, which tries to take the above
mutex... kaboom.

Yes, we -could- probably try to invent some scheme for block devices to
"teach" upper layers that they are being suspended. That would cover
some of the cases and would probably not be done properly for 10 kernel
versions to come... Or we can make all kmalloc() degrade automatically
to GFP_NOIO when suspend is started.

Which one is more likely to actually work ? :-)

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/