Re: [PATCHSET] block, mempool, percpu: implement percpu mempooland fix blkcg percpu alloc deadlock

From: Tejun Heo
Date: Thu Dec 22 2011 - 17:41:23 EST


Hello, Andrew.

On Thu, Dec 22, 2011 at 02:20:58PM -0800, Andrew Morton wrote:
> Don't just consider my suggestions - please try to come up with your own
> alternatives too! If all else fails then this patch is a last resort.

Umm... this is my alternative.

> > but apparently those percpu stats reduced
> > CPU overhead significantly.
>
> Deleting them would save even more CPU.
>
> Or make them runtime or compile-time configurable, so only the
> developers see the impact.
>
> Some specifics on which counters are causing the problems would help here.

These stats are userland visible and quite useful ones if blkcg is in
use. I don't really see how these can be removed.

> > > Or how about we fix the percpu memory allocation code so that it
> > > propagates the gfp flags, then delete this patchset?
> >
> > Oh, no, this is gonna make things *way* more complex. I tried.
>
> But there's a difference between fixing a problem and working around it.

Yeah, that was my first direction too. The reason why percpu can't do
NOIO is the same one why vmalloc can't do it. It reaches pretty deep
into page table code and I don't think doing all that churning is
worthwhile or even desirable. An altnernative approach would be
implementing transparent front buffer to percpu allocator, which I
*might* do if there really are more of these users, but I think
keeping percpu allocator painful to use from reclaim context isn't
such a bad idea.

There have been multiple requests for atomic allocation and they all
have been successfully pushed back, but IMHO this is a valid one and I
don't see a better way around the problem, so while I agree using
mempool for this is a workaround, I think it is a right choice, for
now, anyway.

> > If we're gonna have many more NOIO percpu users, which I don't
> > think we would or should, that might make sense but, for fringe
> > cases, extending mempool to cover percpu is a much better sized
> > solution.
>
> I've long felt that we goofed with the gfp_flags thing and that it
> should be a field in the task_struct. Now *that* would be a large
> patch!

Yeah, some of PF_* flags already carry related role information. I'm
not too sure how much pushing the whole thing into task_struct would
change tho. We would need push/popping. It could be simpler in some
cases but in essence wouldn't we have just relocated the position of
parameter?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/