Re: [PATCHSET] block, mempool, percpu: implement percpu mempooland fix blkcg percpu alloc deadlock

From: Tejun Heo
Date: Tue Dec 27 2011 - 16:44:27 EST

Hello, Andrew.

On Tue, Dec 27, 2011 at 01:20:56PM -0800, Andrew Morton wrote:
> > That's the *whole* reason why allocation
> > buffering is used there. It's filled from GFP_KERNEL context and
> > consumed from GFO_NOIO and as pointed out multiple times the allocaion
> > there is infrequent and can be opportunistic. Sans the use of small
> > buffering items, this isn't any different from deferring allocation to
> > different context. There's no guarantee when that allocation would
> > happen but in practice both will be reliable enough for the given use
> > case.
> "reliable enough" is not the standard to which we aspire.
> Look, the core problem here is that this block code is trying to
> allocate from an inappropriate context. Rather than hacking around
> adding stuff to make this work most-of-the-time, how about fixing the
> block code to perform the allocation from the correct context?

Yeah, sure, that would be nice. I'm just not convinced that would be
possible in a way which would be in the end better than buffering
memory allocations some way. For better or for worse, the whole blkcg
/ ioc / request_queue association mechanism is based on essentially
opportunistic allocation on the expectation that the number of used
combinations would be low and allocation frequency would be low too.

> > I don't necessarily insist on using mempool here but all the given
> > objections seem bogus to me. The amount of code added is minimal and
> > straight-forward. It doesn't change what mempool is or what it does
> > at all and the usage fits the problem to be solved. I can't really
> > understand what the objection is about.
> It's adding complexity to core kernel. It's adding a weak and
> dangerous interface which will encourage poor code in callers.
> And why are we doing all of this? So we can continue to use what is
> now poor code at one particular site in the block layer! If the block
> code wants to run alloc_percpu() (which requires GFP_KERNEL context)
> then it should be reworked to do so from GFP_KERNEL context. For that
> is the *best* solution, no?

Ummm... I'm not arguing this is the best solution in the world.

I'm not convinced trying to put this into GFP_KERNEL context would
work. Short of that, the next best thing would be making percpu
allocator useable from memory reclaim path, right? But that would
involved a lot more churn and complexity without much added benefit,
given that this type of use cases aren't expected to be common - and
I'm fairly sure it isn't given track record of past few years.

IMHO, core code is there to help its users. In this case, if we
choose to buffer allocation somehow, we sure can add that to block
layer, but that would be pretty silly thing to do. It's not keeping
anything simple. It's just hiding things.

Anyways, let's stop this abstract discussion. I think the crux of
disagreement is about whether this can be moved into GFP_KERNEL
context or not. Let's continue with Vivek's subthread.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at