Re: [PATCHSET] block, mempool, percpu: implement percpu mempooland fix blkcg percpu alloc deadlock

From: Tejun Heo
Date: Tue Dec 27 2011 - 17:08:07 EST


Hello,

On Tue, Dec 27, 2011 at 01:25:01PM -0800, Andrew Morton wrote:
> umm, we've already declared that it is OK to completely waste this
> memory for the users (probably a majority) who will not be using
> these stats.

We're talking about combinatorial combinations where only small subset
is usually expected to be used and, in addition to the absolute usage,
there's big advantage in showing behavior which users would expect.
If 1000 cgroups are doing IOs to 1000 devices, it's expected to
consume some amount of resource.

The whole io_context / blk_cgroup - request_queue association
mechanism is based on opportunistic allocation. It might not be the
prettiest thing in the world but given the circumstances IMHO the
approach fits the constraints defined by the problem.

Given the restricted nature of percpu allocation, it would be nice to
punt it to GFP_KERNEL context *somewhere* and for block layer that
somewhere probably can only be userland access. I just don't see that
fitting better here. The suggested alternative seems much nastier
with userland visible side effects and possibility for combinatorial
increase in memory usage for something as benign as single cat of stat
files.

Also, such erratic userland visible behavior is deviation from the
current one and at the same time we would be bound to the
idiosyncracies later when we can improve the implementation.

I can't see how that can be a better tradeoff. It shifts the problem
to even more cumbersome corner.

> Also, has this stuff been tested at that scale? I fear to think what
> 10000 allocations will do to fragmetnation of the vmalloc() arena.

Percpu allocator doesn't use vmalloc directly. It maps address ranges
(which is at least 32k and usually much larger) from vmalloc space and
allocate it using simplistic extent allocator.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/