Re: [PATCHSET] block, mempool, percpu: implement percpu mempool andfix blkcg percpu alloc deadlock
From: Andrew Morton
Date: Thu Dec 22 2011 - 18:41:41 EST
On Thu, 22 Dec 2011 15:24:33 -0800
Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello,
>
> On Thu, Dec 22, 2011 at 03:16:49PM -0800, Andrew Morton wrote:
> > > The ones allocated in the last patch. blk_group_cpu_stats.
> >
> > What last patch.
> >
> > I can find no occurence of "blk_group_cpu_stats" on linux-kernel or in
> > the kernel tree.
>
> Sorry it's blkio_group_stats_cpu. It's in the seventh path in this
> series.
>
> > > The stats are per cgroup - request_queue pair. We don't want to
> > > allocate for all of them for each combination as there are
> > > configurations with stupid number of request_queues and silly many
> > > cgroups and #cgroups * #request_queue * #cpus can be huge. So, we
> > > want on-demand allocation. While the stats are important, they are
> > > not critical and allocations can be opportunistic. If the allocation
> > > fails this time, we can try it for the next time.
> >
> > Without code to look at I am at a loss.
>
> block/blk-cgroup.c blk-throttle.c cfq-iosched.c. Have fun.
>
> > request_queues are allocated in blk_alloc_queue_node(), which uses
> > GFP_KERNEL (and also mysteriously takes a gfp_t arg).
>
> Yeah, sure, we *can* allocate everything for every combination when
> either request_queue or cgroup comes up. That's the thing I tried to
> explain in the above quoted paragraph.
>
All the code I'm looking at assumes that blkio_group.stats_cpu is
non-zero. Won't the kerenl just go splat if that allocation failed?
If the code *does* correctly handle ->stats_cpu == NULL then we have
options.
a) Give userspace a new procfs/debugfs file to start stats gathering
on a particular cgroup/request_queue pair. Allocate the stats
memory in that.
b) Or allocate stats_cpu on the first call to blkio_read_stat_cpu()
and return zeroes for this first call.
c) Or change the low-level code to do
blkio_group.want_stats_cpu=true, then test that at the top level
after we've determined that blkio_group.stats_cpu is NULL.
d) Or, worse, punt the allocation into a workqueue thread.
Note that all these option will permit us to use GFP_KERNEL, which is
better.
Note that a) and b) means that users get control over whether these
stats are accumulated at all, so many won't incur needless memory and
CPU consumption.
I think I like b). Fix the code so it doesn't oops when ->stats_cpu is
NULL, then turn on stats gathering the first time someone tries to read
the stats.
(Someone appears to have misspelled "throttle" as "throtl" for no
apparent reason about 1000 times. Sigh.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/