Re: [PATCHSET] block, mempool, percpu: implement percpu mempooland fix blkcg percpu alloc deadlock

From: Tejun Heo
Date: Tue Dec 27 2011 - 17:22:30 EST

Hello, Andrew.

On Tue, Dec 27, 2011 at 01:58:36PM -0800, Andrew Morton wrote:
> > But that would
> > involved a lot more churn and complexity without much added benefit,
> > given that this type of use cases aren't expected to be common - and
> > I'm fairly sure it isn't given track record of past few years.
> I don't think it would be too hard to add an alloc_percpu_gfp(). Add
> the gfp_t to a small number of functions (two or three?) then change
> pcpu_mem_zalloc() to always use kzalloc() if (flags & GFP_KERNEL !=
> GFP_KERNEL). And that's it?

Hmmm... What do you mean "always use kzalloc()"? Percpu memory can't
be served by kzalloc(). They need to be congruent.

Currently, percpu allocation happens in two stages. The first stage
is locating congruent address areas in vmalloc space (so that static
percpu offsets among different CPUs can be maintained for dynamic
percpu areas). The second stage is actually allocating pages and
populating those areas. Because percpu memory is expensive, the
allocator keeps both steps on demand. When it runs out of address
space, it gets some and the addresses are filled only when the memory
areas are actually handed out.

Unfortunately, both steps involve vmalloc machinery and GFP_KERNEL
assumption reaches into arch specific code for page table allocation.
So, making all those steps reclaim path friendly would be quite a

*If* we're gonna solve this at the allocator level, we'll probably end
up with front buffer in percpu allocator, which buffers populated
percpu areas to give out for reclaim path allocations, which is
doable, would involve a lot more complexity and would probably need to
keep more percpu memory reserved to stay generic.

> But the question is: is this a *good* thing to do? It would be nice if
> kernel developers understood that GFP_KERNEL is strongly preferred and
> that they should put in effort to use it. But there's a strong
> tendency for people to get themselves into a sticky corner then take
> the easy way out, resulting in less robust code. Maybe calling the
> function alloc_percpu_i_really_suck() would convey the hint.

I fully agree and have been pushing back pretty hard against people
trying to allocate percpu memory from !GFP_KERNEL context and I want
to keep it that way. But, to me, this one seems like a valid use
case. If we want to track dynamic associations in IO path, short of
allocating for all combinatorial combinations, we can't avoid
performing NOIO allocations. Of course, the code path should be
careful so that it's not too demanding on the allocator and must be
able to degrade operation gracefully on allocation failures. It's a
special case but a valid one at that.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at